READINGS  AND   PROBLEMS 
IN    STATISTICAL    METHODS 


^n^^ 


THE  MACMILLAN  COMPANY 

NEW  YORK   •    BOSTON    •    CHICAGO  -    DALLAS 
ATLANTA   •    SAN    FRANCISCO 

MACMILLAN  &  CO.,  Limited 

LONDON  •  BOMBAY  •  CALCUTTA 
MELBOURNE 

THE  MACMILLAN  CO.  OF  CANADA,  Ltd. 

TORONTO 


READINGS  AND  PROBLEMS 

IN 

STATISTICAL  METHODS 


BY 


HORACE   SECRIST,   Ph.D. 

PROFESSOR   OF   ECONOMICS   AND   STATISTICS 

NORTHWESTERN    UNIVERSITY 

DIRECTOR,    THE   BUREAU    OF   BUSINESS   RESEARCH 

NORTHWESTERN    UNIVERSITY    SCHOOL 

OF   COMMERCE 

AUTHOR  OF    "  AN   INTRODUCTION   TO   STATISTICAL   METHODS  " 


THE   MACMILLAN   COMPANY 

1920 

All  rights  reierved 


COPTBIOHT,    1920, 

By   the  MACMILLAN   COMPANY. 


Set  up  and  electrotyped.     Published  October,  igao. 


Norinool)  ^xns 

J.  8.  Cashing  Co.  —  Berwick  <fc  Smith  Co. 

Norwood,  Mass.,  U.S.A. 


CONTENTS 

CHAPTER  PAOE 

I.     Meaning  and  Application  op  Statistics  and  Sta- 
tistical Methods        .         .         .    ^  .        .        .  1 

1.  Scientific  Method  —  Its  Scope  and  Meaning         .  1 

Review .   ■      .         .         .         .         .         .         .13 

2.  Why  Statistics  and  Its  Methods  ....  14 

Review ....!...  19 

3.  Statistical  Control  Including  Costs  as  a  Factor  in 

Production 20 

Review ........  34 

4.  Scientific  Methods  —  The  Method  of  Investiga- 

tion in  Relation  to  Business  Cycles     .         .  35 

Review ........  37 

5.  The  Statistical  Method  of  Discovering  and  Widen- 

ing Markets 38 

Review  ........  46 

II.     Sources  and  Collection  of  STATLsncAL  Data       .  47 

6.  Statistics  of  Unemployment          ....  47 

Review ........  58 

7.  A  List  of  Series  Now  Available  Showing  Volume 

of  Production  in  the  United  States     .         .  59 

Review ........  62 

8.  Sampling  of  Coal .......  62 

Review ........  64 

9.  Government  Crop  Reports   .....  64 

Review ........  90 

10.  Sampling  as  an  Alternative  to  a  Count         .         .  91 

Review  .         .         .         .         .         .         .         .111 

11.  Sampling  in  the  Development  of  Markets    .         .111 

Review 125 

V 


VI 


CONTENTS 


III. 


IV. 


CHAPTER  PAGE 

12.  The  Measurement  of  the  Rate  of  Factory  Output  125 

Review ........  141 

13.  What's  in  a  Name  —  The  Ca,use  of  Death    .         .  141 

Review ........  147 

14.  Statistical  Standards  in  the  Collection  of  Facts     .  148 


Units  of  Measurements  in  Statistical  Studies 

15.    The  Nature  and  Conditions  of  Statistical  Measure 
ment  ..... 
Review  ..... 

A  Mile  of  Track 

Review  ..... 

Accidents  in  Public  Utility  Statistics 
Review  ..... 

Industrial  Accident  Rates     . 

Review  ..... 
19.    Some  Illogical  TTnits  in  Railway  Statistics 
Review  ...... 


16. 


17. 


18. 


150 

150 
159 
160 

160 

161 
164 

164 
184 
186 
190 


Illustrations    of   Methods    in    Collecting    Sta- 
tistical Data      .......     191 


20.  Study  of  Wages  —  Method 

Review  ........ 

21.  Statistics  of  the  United  States  Shipping  Board     . 

22.  Points  to  Be  Considered  in  the  Use  and  Form  of 

Questionnaires 

Review . 


27. 


191 
210 

210 

224 
229 
229 
236 
238 

242 

242 
Review 258 

Standardization  of  the  Construction  of  Statistical 

Tables 259 

Review  .         .         .         ...         .         .  268 

Statistical  Standardization  in  Tabulating  Facts    .  268 


23.  Editing  of  Schedules 

Review  . 

24.  Review  Problems 

Classification  —  Tabular  Presentation 

25.  The  Purpose  and  Method  of  Tabulation 


26. 


CONTENTS  Vll 

CHAPTER  PAGE 

28.  A  Census  Card 270 

29.  Review  Problems 271 

VI.     Diagrammatic  and  Graphic  Presentation     .         .  273 

30.  Rules   for    Diagrammatic    Presentation    of    Sta- 

tistical Data       ......  273 

31.  Statistical  Standards  in  the  Graphic  Presentation 

of  Facts 276 

32.  The  Theory  and  Justification  of  Curve  Smoothing  278 

Review  ........  282 

33.  Some  Advantages   of   the   Logarithmic   Scale  in 

Statistical  Diagrams            ....  282 

34.  Review  Problems           ......  306 

VII.     Averages  as  Types .  318 

35.  The  Use  of  Averages  in  Presenting  Wage  Sta- 

tistics         .......  318 

36.  Weighted  Averages  and  Crop  Reporting       .         .  329 

37.  Compensating  Errors  —  The  Logic  of  Large  Num- 

bers in  Crop  Reporting       .         .         .         .331 

Review ........  334 

38.  The  Calculation  of  the  Average  Tariff  Duty  or 

Rate 334 

39.  Averages  as  Measures  of  Street  Car  Utilization    .  341 

Review 344 

40.  Car  Seat  Mile  Averages  and  Ratios      .         .         .  344 

Review  ........  347 

41.  Review  Problems           .         .         .         .         .         .  348 

VIII.     Principles     of     Index     Number     Making     and 

Using  .........  350 

42.  Method  of  Computing  Index  Numbers  —  Bureau 

of  Crop  Estimates      .....  350 

43.  The  Why  and  How  of  Stock  Index  Numbers        .  354 

Review  ........  359 


viii  CONTEXTS 

CHAPTER  PACK 

44.  Weighting  and  the  ^Making  of  Stock  Index  Num- 

bers     360 

Review 364 

45.  Conclusions  on  the  Making  of  Stock  Index  Num- 

bers   364 

Review 367 

46.  Review  Problems           .         ,         .         ...         .  367 

IX.     Deschiption      and      Summarization  —  Dispersion 

AND  Skewness     .......  369 

47.  The  Nature  of  Statistical  Knowledge   .         .         .  369 

Re\iew 384 

48.  The  Horizontal  Zero  in  Frequency  Diagrams  385 

49.  Review  Problems 394 

X.     Comparison  —  Correlation 396 

50.  The  Limits  of  Statistics 396 

51.  Difl&culties    in    International    Statistical    Com- 

parisons     .......  397 

52.  Difficulties  in  International  Comparison  of  Wages  398 

53.  The  Coefficient  of  Correlation       ....  400 

.54.    Statistical    Standards    in    the    Interpretation    of 

Facts 416 

55.    Review  Problems 418 

Index .         .         .  421 


INTRODUCTION 

The  selections  included  in  this  book  were  chosen  to  illus- 
trate concretely  the  attitude  of  mind  in  which  statistical 
analysis  must  be  undertaken,  and  to  develop  logically  the 
steps  and  processes  through  which  statistical  data  must  be 
carried  in  order  to  be  used  as  bases  for  logical  inferences. 
They  constitute  within  themselves  an  independent  treatment 
of  statistical  principles ;  but,  undoubtedly,  will  have  their 
greatest  value  when  used  in  connection  with  a  text  on  statisti- 
cal methods.  They  are  intended  primarily  to  be  used  in  this 
manner. 

The  use  of  statistics  is  consciously  emphasized.  "ICm- 
balmed"  statistics  have  no  part  in  the  treatment,  as  they 
have  no  place  in  tlu^  writer's  interest.  The  collection,  use, 
and  interpretation  of  statistical  data  are  justified  largely, 
if  not  solely,  in  the  service  which  they  have  for  planning, 
whether  it  is  related  to  questions  of  social  control,  business 
policy,  or  statecraft. 

It  has  seemed  wise  to  accompany  the  selections  with 
pertinent,  thought-provoking  questions,  which  students  and 
others  may  use  as  a  basis  for  criticism  and  constructive 
analysis.  Accordingly,  review  questions  are  made  com- 
ponent parts  of  the  treatment.  It  is  not  intended  that 
these  shall  be  used  solely  as  a  means  of  making  easy  the 
assimilation  of  the  contents  of  the  selections,  but  rather,  that 
they  shall  serve  to  connect  the  subject  matter  with  the  ex- 
perience and  training  of  the  reader. 

Review  Problems  have  been  added  at  the  close  of  those 
chapters,  the  subject  matter  of  which  seems  to  lend  itself 

ix 


X  INTRODUCTION 

to  concrete  application  or  to  laboratory  use.  It  is  the 
teachers'  obligation  to  make  his  laboratory  exercises  of 
interest  to  those  whom  he  asks  to  take  part  in  them,  and  to 
couple  them  with  concrete  business,  industrial  and  social 
experiences.  The  make-work  problems  to  which  students 
are  too  often  assigned,  as  part  of  their  laboratory  work,  not 
only  fail  to  arouse  intellectual  interest,  but  have  the  effect 
of  divorcing  the  laboratory  from  the  life  which  the  student 
is  living.  They  are  too  often  looked  upon  as  tasks  or 
penalties,  rather  than  as  opportunities  to  take  part  in  ex- 
plaining, illustrating,  and  summarizing  data  which  have  to 
be  manipulated  before  they  can  be  used  as  bases  for  business 
and  social  judgments. 

Laboratory  problems  should  be  chosen  from  business  and 
social  fields,  and  should  include  topics  in  which  the  student 
himself  has  an  interest,  and  which  he  would  be  willing  and 
eager  to  study  statistically,  in  order  more  fully  to  under- 
stand. It  is  not  difficult  to  select  problems  of  this  character 
and  to  secure  data  relating  to  them.  In  no  other  single 
problem,  in  the  writer's  experience  as  a  teacher,  has  so  much 
interest  been  developed  on  the  part  of  his  students  in  sta- 
tistics, as  in  the  study  of  expenditures  for  food  at  a  local 
cafeteria.  Theater  tickets,  types  of  business  buildings, 
real  estate  valuations,  show  window  decorations,  classified 
advertisements,  types  of  news  items,  stock  and  bond  quota- 
tions, money  rates,  etc.,  all  lend  themselves  to  statistical 
treatment  and  arouse  statistical  interest.  The  writer  has 
never  been  at  a  loss  to  find  problems  which  create  interest 
and  which  are  worthy  of  study. 

It  is,  therefore,  with  considerable  hesitation  that  so- 
called  Review  Problems  have  been  included  in  this  book. 
The  repeated  requests  on  the  part  of  instructors  in  Statistics 
for  laboratory  problems  is  the  primary  excuse  which  the 


INTRODUCTION  XI 

writer  has  for  including  them  here.  It  is  hoped  that  they 
will  be  found  of  some  interest  to  instructors  in  solving  their 
laboratory  difficulties,  or  of  calling  their  attention  to  the 
problems  immediately  about  them  which  may  be  used  in 
their  stead. 

The  frequent  references  to  the  Text  in  the  Reviews  and 
Review  Problems  are  to  the  author's  Introduction  to  Statistical 
Methods.  Wliile  the  Introduction  and  Readings  are  intended 
to  be  used  together,  either  of  them  may  be  used  separately 
for  text  or  general  purposes.  It  has  seemed  wise  to  employ 
the  same  chapter  headings  in  the  two  volumes  and  this  plan 
is  followed.  Chapters  VI  and  VII,  VIII,  IX  and  X,  XI, 
and  XII  in  the  Introduction,  however,  become  Chapters  VI, 
VII,  VIII,  IX,  and  X,  respectively,  in  the  Readings. 

It  is  a  pleasure  for  the  writer  to  acknowledge  his  obliga- 
tion to  the  authors  and  publishers  of  the  selections  included 
for  the  privilege  of  reprinting  them,  and  to  express  his  ap- 
preciation of  the  value  which  they  have  been  to  him  in  clari- 
fying his  own  ideas  on  the  meaning,  function,  and  use  of 
statistical  methods  in  the  understanding  of  business  and  social 
problems.  It  is  the  writer's  hope  that  they  will  be  equally 
interesting  to  those  into  whose  hands  this  volume  may  come. 

Horace  Secrist. 

Northwestern  University, 

Evanston-Chicago,  Illinois. 

June,  1920. 


X  INTRODUCTION 

to  concrete  application  or  to  laboratory  use.  It  is  the 
teachers'  obligation  to  make  his  laboratory  exercises  of 
interest  to  those  whom  he  asks  to  take  part  in  them,  and  to 
couple  them  with  concrete  business,  industrial  and  social 
experiences.  The  make-work  problems  to  which  students 
are  too  often  assigned,  as  part  of  their  laboratory  work,  not 
only  fail  to  arouse  intellectual  interest,  but  have  the  effect 
of  divorcing  the  laboratory  from  the  life  which  the  student 
is  living.  They  are  too  often  looked  upon  as  tasks  or 
penalties,  rather  than  as  opportunities  to  take  part  in  ex- 
plaining, illustrating,  and  summarizing  data  which  have  to 
be  manipulated  before  they  can  be  used  as  bases  for  business 
and  social  judgments. 

Laboratory  problems  should  be  chosen  from  business  and 
social  fields,  and  should  include  topics  in  which  the  student 
himself  has  an  interest,  and  which  he  would  be  willing  and 
eager  to  study  statistically,  in  order  more  fully  to  under- 
stand. It  is  not  difficult  to  select  problems  of  this  character 
and  to  secure  data  relating  to  them.  In  no  other  single 
problem,  in  the  writer's  experience  as  a  teacher,  has  so  much 
interest  been  developed  on  the  part  of  his  students  in  sta- 
tistics, as  in  the  study  of  expenditures  for  food  at  a  local 
cafeteria.  Theater  tickets,  types  of  business  buildings, 
real  estate  valuations,  show  window  decorations,  classified 
advertisements,  types  of  news  items,  stock  and  bond  quota- 
tions, money  rates,  etc.,  all  lend  themselves  to  statistical 
treatment  and  arouse  statistical  interest.  The  writer  has 
never  been  at  a  loss  to  find  problems  which  create  interest 
and  which  are  worthy  of  study. 

It  is,  therefore,  with  considerable  hesitation  that  so- 
called  Review  Problems  have  been  included  in  this  book. 
The  repeated  requests  on  the  part  of  instructors  in  Statistics 
for  laboratory  problems  is  the  primary  excuse  which  the 


INTRODUCTION  XI 

writer  has  for  including  them  here.  It  is  hoped  that  they 
will  be  found  of  some  interest  to  instructors  in  solving  their 
laboratory  difficulties,  or  of  calling  their  attention  to  the 
problems  immediately  about  them  which  may  be  used  in 
their  stead. 

The  frequent  references  to  the  Text  in  the  Reviews  and 
Review  Problems  are  to  the  author's  Introduction  to  Statistical 
Methods.  While  the  Introduction  and  Readings  are  intended 
to  be  used  together,  either  of  them  may  be  used  separately 
for  text  or  general  purposes.  It  has  seemed  wise  to  employ 
the  same  chapter  headings  in  the  two  volumes  and  this  plan 
is  followed.  Chapters  VI  and  VII,  VIII,  IX  and  X,  XI, 
and  XII  in  the  Introduction,  however,  become  Chapters  VI, 
VII,  VIII,  IX,  and  X,  respectively,  in  the  Readings. 

It  is  a  pleasure  for  the  writer  to  acknowledge  his  obliga- 
tion to  the  authors  and  publishers  of  the  selections  included 
for  the  privilege  of  reprinting  them,  and  to  express  his  ap- 
preciation of  the  value  which  they  have  been  to  him  in  clari- 
fying his  own  ideas  on  the  meaning,  function,  and  use  of 
statistical  methods  in  the  understanding  of  business  and  social 
problems.  It  is  the  writer's  hope  that  they  will  be  equally 
interesting  to  those  into  whose  hands  this  volume  may  come. 

Horace  Secrist. 

Northwestern  University, 

Evanston-Chicago,  Illinois. 

June,  1920. 


READINGS  AND  PROBLEMS 
IN  STATISTICAL  METHODS 


CHAPTER  I 

THE  MEANING  AND  APPLICATION  OF  STATISTICS 
AND  STATISTICAL  METHODS 

Scientific  Method  —  Its  Scope  and  Meaning  ^ 

Within  the  past  forty  years  so  revolutionary  a  change 
has  taken  place  in  our  appreciation  of  the  essential  facts  in 
the  growth  of  human  society,  that  it  has  become  necessary 
not  only  to  rewrite  history,  but  to  profoundly  modify  our 
theory  of  life  and  gradually,  but  none  the  less  certainly,  to 
adapt  our  conduct  to  the  novel  theory.  The  insight  which 
the  investigations  of  Darwin,  seconded  by  the  suggestive 
but  far  less  permanent  work  of  Spencer,  have  given  us  into 
the  development  of  both  individual  and  social  life,  has  com- 
pelled us  to  remodel  our  historical  ideas  and  is  slowly  widen- 
ing and  consolidating  our  moral  standards.  This  slowness 
ought  not  to  dishearten  us ,  for  one  of  the  strongest  factors 
of  social  stabiHty  is  the  inertness,  nay,  rather  active  hos- 
tility, with  which  human  societies  receive  all  new  ideas. 
It  is  the  crucible  in  which  the  dross  is  separated  from  the 

*  Adapted  with  permission  from  Pearson,  Karl,  The  Grammar  of  Science, 
Second  edition,  revised  and  enlarged,  Chapter  I,  pp.  1-14.  A.  and  C.  Black, 
London. 

B  1 


2  STATISTICAL  METHODS 

genuine  metal,  and  which  saves  the  body-ssocial  from  a  suc- 
cession of  unprofitable  and  possibly  injurious  experimental 
variations.  That  the  reformer  should  often  bo  also  the 
martyr  is.  perhaps,  a  not  over-gi-eat  price  to  pay  for  the 
caution  with  which  society  as  a  whole  must  move ;  it  may 
require  years  to  replace  a  great  leader  of  men.  but  a  stable 
and  efficient  society  can  only  be  the  outcome  of  centuries 
of  development. 

If  we  have  learned,  it  may  be  indirectly,  from  the  writings 
of  Darwin  that  the  methods  of  production,  the  mode  of 
holding  property,  the  forms  of  marriage,  the  organizations  of 
the  family  and  of  the  commune  are  the  essential  factors  which 
the  historian  has  to  trace  in  the  growth  of  human  society ; 
if  in  our  liistory  books  we  are  ceasing  to  head  periods  with 
the  names  of  monarchs  and  to  devote  whole  paragraphs  to 
their  mistresses,  still  we  are  far  indeed  from  clearly  grasping 
the  exact  interaction  of  the  various  factors  of  social  evolu- 
tion, or  from  understanding  why  one  becomes  predominant 
at  this  or  that  epoch.  We  can  indeed  note  periods  of  great 
social  actiWty  and  others  of  apparent  quiescence,  but  it  is 
probably  only  our  ignorance  of  the  exact  coui"se  of  social 
evolution  wliich  leads  us  to  assign  fundamental  changes  in 
social  institutions  either  to  indiAndual  man  or  to  reforma- 
tions and  revolutions.  We  associate,  it  is  true,  the  German 
Reformation  with  a  replacement  of  collectivist  by  individ- 
ualist standards,  not  only  in  religion  but  also  in  handicraft, 
art,  and  poUtics.  The  French  Revolution  in  like  manner 
is  the  epoch  from  which  many  are  inclined  to  date  the  re- 
birth of  those  social  ideas  which  have  largely  remolded  the 
medieval  relations  of  class  and  caste,  relations  little  afTected 
by  the  sixteenth-century  Reformation.  Coming  somewhat 
nearer  to  our  own  time  we  can  indeed  measure  with  some 
degree  of  accuracy  the  social  influence  of  the  great  changes 


THE  MEANING  OF  STATISTICAL  METHODS  3 

in  the  methods  of  production,  the  transition  from  home  to 
capitaUstic  industry,  which  transformed  EngUsh  hfe  in  the 
first  half  of  this  century,  and  has  since  made  its  way  through- 
out the  civilized  world.  But  when  we  actually  reach  our 
own  age,  an  age  one  of  the  most  marked  features  of  which 
is  the  start lingly  rapid  growth  of  the  natural  sciences  and 
their  far-reaching  influence  on  the  standards  of  both  the 
comfort  and  the  conduct  of  human  life,  we  find  it  impossible 
to  compress  its  social  history  into  the  bald  phrases  by  which 
we  attempt  to  connote  the  characteristics  of  more  distant 
historical  epochs.  .  .  . 

The  contest  of  opinion  in  nearly  every  field  of  thought  — 
the  struggle  of  old  and  new  standards  in  every  sphere  of 
activity,  in  reUgion,  in  commerce,  in  social  life  —  touches 
the  spiritual  and  physical  needs  of  the  individual  far  too 
nearly  for  him  to  be  a  dispassionate  judge  of  the  age  in 
which  he  lives.  That  we  play  our  parts  in  an  era  of  rapid 
social  change  can  scarcely  be  doubted  by  any  one  who  re- 
gards attentively  the  marked  contrasts  presented  by  our 
modem  society.  It  is  an  era  alike  of  great  self-assertion 
and  of  excessive  altruism;  we  see  the  highest  intellectual 
power  accompanied  by  the  strangest  recrudescence  of  super- 
stition; there  is  a  strong  socialist  drift  and  yet  not  a  few 
remarkable  indi\adualist  teachers ;  the  extremes  of  re- 
ligious faith  and  of  unequivocal  freethought  are  found  jos- 
tling each  other.  Nor  do  these  opposing  traits  exist  only 
in  close  social  juxtaposition.  The  same  individual  mind, 
unconscious  of  its  own  want  of  logical  consistency,  will 
often  exhibit  our  age  in  microcosm. 

It  is  little  wonder  that  we  have  hitherto  made  small  way 
towards  a  conmion  estimate  of  what  our  time  is  really  con- 
tributing to  the  history  of  human  progress.  The  one  man 
finds  in  our  age  a  restlessness,  a  distrust  of  authority,  a 


4  STATISTICAL  METHODS 

questioning  of  the  basis  of  all  social  institutions  and  long- 
established  methods  —  characteristics  which  mark  for  him 
a  decadence  of  social  unity,  a  collapse  of  the  time-honored 
principles  which  he  conceives  to  be  the  sole  possible  guides 
of  conduct.  A  second  man  with  a  different  temperament 
pictures  for  us  a  golden  age  in  the  near  future,  when  the  new 
knowledge  shall  be  diffused  through  the  people,  and  when 
those  modern  notions  of  human  relations,  which  he  finds 
everywhere  taking  root,  shall  finally  have  supplanted  worn- 
out  customs. 

One  teacher  propounds  what  is  flatly  contradicted  by  a 
second.  "We  want  more  piety,"  cries  one;  "We  must 
have  less,"  retorts  another.  "State  interference  in  the 
hours  of  labor  is  absolutely  needful,"  declares  a  third; 
"It  will  destroy  all  individual  initiation  and  self-depend- 
ence," rejoins  a  fourth.  "The  salvation  of  the  country 
depends  upon  the  technical  education  of  its  work  people," 
is  the  shout  of  one  party;  "Technical  education  is  merely 
a  trick  by  which  the  employer  of  labor  thrusts  upon  the 
nation  the  expense  of  providing  himself  with  better  human 
machines,"  is  the  prompt  answer  of  its  opponents.  "We 
need  more  private  charity,"  say  some;  "All  private  charity 
is  an  anomaly,  a  waste  of  the  nation's  resources  and  a 
pauperizing  of  its  members,"  reply  others.  "Endow  sci- 
entific research  and  we  shall  know  the  truth,  when  and 
where  it  is  possible  to  ascertain  it";  but  the  counterblast 
is  at  hand:  "To  endow  research  is  merely  to  encourage 
the  research  for  endowment;  the  true  man  of  science  will 
not  be  held  back  by  poverty,  and  if  science  is  of  use  to  us, 
it  will  pay  for  itself."  Such  are  but  a  few  samples  of  the 
conflict  of  opinion  which  we  find  raging  aroimd  us.  The 
prick  of  conscience  and  the  spur  of  highly  wrought  sym- 
pathy have  succeeded  in  arousing  a  wonderful  restlessness 


THE  MEANING  OF  STATISTICAL  METHODS  5 

in  our  generation  —  and  this  at  a  time  when  the  advance 
of  positive  knowledge  has  called  in  question  many  old 
customs  and  old  authorities.  .  .  . 

The  state  has  become  in  our  day  the  largest  employer  of 
labor,  the  greatest  dispenser  of  charity,  and,  above  all, 
the  schoolmaster  with  the  biggest  school  in  the  community. 
Directly  or  indirectly  the  individual  citizen  has  to  find 
some  reply  to  the  innumerable  social  and  educational  prob- 
lems of  the  day.  He  requires  some  guide  in  the  determina- 
tion of  his  own  action  or  in  the  choice  of  fitting  representa- 
tives. He  is  thrust  into  an  appalling  maze  of  social  and 
educational  problems ;  and  if  his  tribal  conscience  has 
any  stuff  in  it,  he  feels  that  these  problems  ought  not  to  be 
settled,  so  far  as  he  has  the  power  of  settling  them,  by  his 
own  personal  interests,  by  his  individual  prospects  of  profit 
or  loss.  He  is  called  upon  to  form  a  judgment  apart,  if 
it  possibly  may  be,  from  his  own  feelings  and  emotions  — 
a  judgment  in  what  he  conceives  to  be  the  interests  of 
society  at  large.  It  may  be  a  difficult  thing  for  the  lai'ge 
employer  of  labor  to  form  a  right  judgment  in  matters  of 
'factory  legislation,  or  for  the  private  schoolmaster  to  see 
clearly  in  questions  of  state-aided  education.  None  the 
less  we  should  probably  all  agree  that  the  tribal  conscience 
ought  for  the  sake  of  social  welfare  to  be  stronger  than 
private  interests,  and  that  the  ideal  citizen,  if  he  existed, 
would  form  a  judgment  free  from  personal  bias. 

Science  and  Citizenship 

How  is  such  a  judgment  —  so  necessary  in  our  time  with 
its  hot  conflict  of  individual  opinions  and  its  increased 
responsibility  for  the  individual  citizen  —  how  is  such  a 
judgment  to  be  formed?     In  the  first  place  it  is  obvious 


6  STATISTICAL  METHODS 

that  it  can  only  be  based  on  a  clear  knowledge  of  facts,  an 
appreciation  of  their  sequence  and  relative  significance. 
The  facts  once  classified,  once  understood,  the  judgment 
based  upon  them  ought  to  be  ^  independent  of  the  individ- 
ual mind  which  examines  them.  Is  there  any  other 
sphere,  outside  that  of  ideal  citizenship,  in  which  there  is 
habitual  use  of  this  method  of  classifjdng  facts  and  form- 
ing judgments  upon  them?  For  if  there  be,  it  cannot  fail 
to  be  suggestive  as  to  methods  of  eUminating  individual 
bias ;  it  ought  to  be  one  of  the  best  training  grounds  for 
citizenship.  The  classification  of  facts  and  the  formation 
of  absolute  judgments  upon  the  basis  of  this  classification  — 
judgments  independent  of  the  idiosyncrasies  of  the  individual 
mind  —  essentially  sum  up  the  aim  and  method  of  modern 
science}  The  scientific  man  has  above  all  things  to  strive 
at  self-elimination  in  his  judgments,  to  provide  an  argu- 
ment which  is  as  true  for  each  individual  mind  as  for  his 
own.  The  classification  of  facts,  the  recognition  of  their  se- 
quence and  relative  significance  is  the  function  of  science, 
and  the  habit  of  forming  a  judgment  upon  those  facts  un- 
biased by  personal  feeling  is  characteristic  of  what  may 
be  termed  the  scientific  frame  of  mind.  The  scientific 
method  of  examining  facts  is  not  peculiar  to  one  class  of  phe- 
nomena and  to  one  class  of  workers ;  it  is  appUcable  to  social 
as  well  as  to  physical  problems,  and  we  must  carefully  guard 
ourselves  against  supposing  that  the  scientific  frame  of  mind 
is  a  peculiarity  of  the  professional  scientist. 

The  First  Claim  of  Modern  Science 

I  have  gone  a  rather  roundabout  way  to  reach  my  defini- 
tion of  science  and  scientific  method.     But  it  has  been  of 

^  The  italics  are  not  found  in  the  original. 


THE  MEANING  OF  STATISTICAL  METHODS         7 

purpose,  for  in  the  spirit  —  and  it  is  a  healthy  spirit  — 
of  our  age  we  are  accustomed  to  question  all  things  and 
to  demand  a  reason  for  their  existence.  The  sole  reason 
that  can  be  given  for  any  social  institution  or  form  of 
human  activity  —  I  mean  not  how  they  came  to  exist, 
which  is  a  matter  of  history,  but  why  we  continue  to  en- 
courage their  existence  —  Ues  in  this  :  their  existence  tends 
to  promote  the  welfare  of  human  society,  to  increase  social 
happiness,  or  to  strengthen  social  stability.  In  the  spirit 
of  our  age  we  are  bound  to  question  the  value  of  science ; 
to  ask  in  what  way  it  increases  the  happiness  of  mankind 
or  promotes  social  efficiency.  We  must  justify  the  existence 
of  modern  science,  or  at  least  the  large  and  growing  de- 
mands which  it  makes  upon  the  national  exchequer.  Apart 
from  the  increased  physical  comfort,  apart  from  the  intel- 
lectual enjoyment  which  modern  science  provides  for  the 
community  .  .  .  there  is  another  and  more  fundamental 
justification  for  the  time  and  energy  spent  in  scientific  work. 
From  the  standpoint  of  morality,  or  from  the  relation  of  the 
individual  unit  to  other  members  of  the  same  social  group, 
we  have  to  judge  each  human  activity  by  its  outcome  in 
conduct.  How,  then,  does  science  justify  itself  in  its  in- 
fluence on  the  conduct  of  men  as  citizens?  I  assert  that 
the  encouragement  of  scientific  investigation  and  the  spread 
of  scientific  knowledge  by  largely  inculcating  scientific 
habits  of  mind  will  lead  to  more  efficient  citizenship  and 
so  to  increased  social  stabiUty,  Minds  trained  to  scientific 
methods  are  less  likely  to  be  led  by  mere  appeal  to  the 
passions  or  by  bUnd  emotional  excitement  to  sanction  acts 
which  in  the  end  may  lead  to  social  disaster.  In  the  first 
and  foremost  place,  therefore,  I  lay  stress  upon  the  edu- 
cational side  of  modern  science,  and  state  my  position  in 
some  such  words  as  these  : 


8  STATISTICAL  METHODS 

Modern  Science,  as  training  the  mind  to  an  exact 
and  impartial  analysis  of  facts,  is  an  education  spe- 
cially fitted  to  promote  sound  citizenship. 

Our  first  conclusion,  then,  as  to  the  value  of  science  for 
practical  life  turns  upon  the  efficient  training  it  provides 
in  method.  The  man  who  has  accustomed  himself  to  mar- 
shal facts,  to  examine  their  complex  mutual  relations,  and 
predict  upon  the  result  of  this  examination  their  inevitable 
sequences  —  sequences  which  we  term  natural  laws  and 
which  are  as  valid  for  every  normal  mind  as  for  that  of  the 
individual  investigator  —  such  a  man,  we  may  hope,  will 
carry  his  scientific  method  into  the  field  of  social  problems. 
He  will  scarcely  be  content  with  merely  superficial  state- 
ment, with  vague  appeal  to  the  imagination,  to  the  emotions, 
to  individual  prejudices.  He  will  demand  a  high  standard 
of  reasoning,  a  clear  insight  into  facts  and  their  results, 
and  his  demand  cannot  fail  to  be  beneficial  to  the  com- 
munity at  large. 

Essentials  of  Good  Science 

I  want  the  reader  to  appreciate  clearly  that  science 
justifies  itself  in  its  methods,  quite  apart  from  any  service- 
able knowledge  it  may  convey.  We  are  too  apt  to  forget 
this  purely  educational  side  of  science  in  the  great  value 
of  its  practical  applications.  We  see  too  often  the  plea 
raised  for  science  that  it  is  useful  knowledge,  while  philology 
and  philosophy  are  supposed  to  have  small  utihtarian  or 
commercial  value.  Science,  indeed,  often  teaches  us  facts 
of  primary  importance  for  practical  life ;  yet  not  on  this 
account,  but  because  it  leads  us  to  classifications  and  sys- 
tems independent  of  the  individual  thinker,  to  sequences 
and  laws  admitting  of  no  play-room  for  individual  fancy, 


THE   MEANING  OF  STATISTICAL  METHODS  9 

must  we  rate  the  training  of  science  and  its  social  value 
higher  than  those  of  philology  and  philosophy.  Herein 
lies  the  first,  but  of  course  not  the  sole,  ground  for  the  popu- 
larization of  science.  That  form  of  popular  science  which 
merely  recites  the  results  of  investigations,  which  merely 
communicates  useful  knowledge,  is  from  this  standpoint  bad 
science,  or  no  science  at  all.  Let  me  recommend  the 
reader  to  apply  this  test  to  every  work  professing  to  give  a 
popular  account  of  any  branch  of  science.  If  any  such  work 
gives  a  description  of  phenomena  that  appeals  to  his 
imagination  rather  than  to  his  reason,  then  it  is  bad  science. 
The  first  aim  of  any  genuine  work  of  science,  however  popu- 
lar, ought  to  be  the  presentation  of  such  a  classification 
of  facts  that  the  reader's  mind  is  irresistibly  led  to  acknowl- 
edge a  logical  sequence  —  a  law  which  appeals  to  the  reason 
before  it  captivates  the  imagination.  Let  us  be  quite  sure 
that  whenever  we  come  across  a  conclusion  in  a  scientific 
work  which  does  not  flow  from  the  classification  of  facts, 
or  which  is  not  directly  stated  by  the  author  to  be  an  as- 
sumption, then  we  are  dealing  with  bad  science.  Good 
science  will  always  be  intelligible  to  the  logically  trained 
mind,  if  that  mind  can  read  and  translate  the  language  in 
which  science  is  written.  The  scientific  method  is  one  and 
the  same  in  all  branches,  and  that  method  is  the  method  of 
all  logically  trained  minds.   .  .  . 

I  would  not  have  the  reader  suppose  that  the  mere  pe- 
rusal of  some  standard  scientific  work  will,  in  my  opinion, 
produce  a  scientific  habit  of  mind.  I  only  suggest  that  it 
will  give  some  insight  into  scientific  method  and  some  appre- 
ciation of  its  value.  Those  who  can  devote  persistently 
some  four  or  five  hours  a  week  to  the  conscientious  study 
of  any  one  hmited  branch  of  science  will  achieve  in  the 
space  of  a  year  or  two  much  more  than  this.     The  busy 


10  STATISTICAL  METHODS 

layman  is  not  bound  to  seek  about  for  some  branch  which 
will  give  him  useful  facts  for  his  profession  or  occupation 
in  hfe.  It  does  not  indeed  matter  for  the  purpose  we  have 
now  in  view  whether  he  seek  to  make  himself  proficient 
in  geology,  or  biology,  or  geometry,  or  mechanics,  or  even 
history  or  folklore,  if  these  be  studied  scientifically.  What 
is  necessary  is  the  thorough  knowledge  of  some  small  group 
of  facts,  the  recognition  of  their  relationship  to  each  other, 
and  of  the  formulae  or  laws  which  express  scientifically 
their  sequences.  It  is  in  this  manner  that  the  mind  be- 
comes imbued  with  the  scientific  method  and  freed  from 
individual  bias  in  the  formation  of  its  judgments.  .  .  . 

The  Scope  of  Science 

The  reader  may  perhaps  feel  that  I  am  laying  stress  upon 
method  at  the  expense  of  material  content.  Now  this  is  the 
peculiarity  of  scientific  method,  that  when  once  it  has  be- 
come a  habit  of  mind,  that  mind  converts  all  facts  whatso- 
ever into  science.  The  field  of  science  is  unlimited ;  its 
material  is  endless,  every  group  of  natural  phenomena, 
every  phase  of  social  life,  every  stage  of  past  or  present 
development  is  material  for  science.  The  unity  of  all  science 
consists  alone  in  its  method,  not  in  its  material.  The  man 
who  classifies  facts  of  any  kind  whatever,  who  sees  their 
mutual  relation,  and  describes  their  sequences,  is  appljang 
the  scientific  method  and  is  a  man  of  science.  The  facts 
may  belong  to  the  past  history  of  mankind,  to  the  social 
statistics  of  our  great  cities,  to  the  atmosphere  of  the  most 
distant  stars,  to  the  digestive  organs  of  a  worm,  or  to  the  hfe 
of  a  scarcely  visible  bacillus.  It  is  not  the  facts  themselves 
which  form  science,  but  the  method  in  which  they  are  dealt 
with.     The  material  of  science  is  coextensive  with  the  whole 


THE  MEANING  OF  STATISTICAL  METHODS        11 

physical  universe,  not  only  that  universe  as  it  now  exists, 
but  with  its  past  history  and  the  past  history  of  all  life 
therein.  When  every  fact,  every  present  or  past  phenome- 
non of  that  universe,  every  phase  of  present  or  past  life  therein, 
has  been  examined,  classified,  and  coordinated  with  the 
rest,  then  the  mission  of  science  will  be  completed.  What 
is  this  but  saying  that  the  task  of  science  can  never  end 
till  man  ceases  to  be,  till  history  is  no  longer  made,  and 
development  itself  ceases? 

It  might  be  supposed  that  science  has  made  such  strides 
in  the  last  two  centuries,  and  notably  in  the  last  fifty  years, 
that  we  might  look  forward  to  a  day  when  its  work  would 
be  practically  accomplished.  At  the  beginning  of  this  cen- 
tury it  was  possible  for  an  Alexander  von  Humboldt  to  take 
a  survey  of  the  entire  domain  of  then  extant  science.  Such 
a  survey  would  be  impossible  for  any  scientist  now,  even 
if  gifted  with  more  than  Humboldt's  powers.  Scarcely 
any  specialist  of  to-day  is  really  master  of  all  the  work 
which  has  been  done  in  his  own  comparatively  small  field. 
Facts  and  their  classification  have  been  accumulating  at 
such  a  rate  that  nobody  seems  to  have  leisure  to  recognize 
the  relations  of  sub-groups  to  the  whole.  It  is  as  if  indi- 
vidual workers  in  both  Europe  and  America  were  bringing 
their  stones  to  one  great  building  and  piling  them  on  and 
cementing  them  together  without  regard  to  any  general 
plan  or  to  their  individual  neighbor's  work ;  only  where 
some  one  has  placed  a  great  corner-stone,  is  it  regarded, 
and  the  building  then  rises  on  this  firmer  foundation  more 
rapidly  than  at  other  points,  till  it  reaches  a  height  at 
which  it  is  stopped  for  want  of  side  support.  Yet  this  great 
structure,  the  proportions  of  which  are  beyond  the  ken 
of  any  individual  man,  possesses  a  symmetry  and  unity 
of  its  own,  notwithstanding  its  haphazard  mode  of  construe- 


12  STATISTICAL  METHODS 

tion.  This  symmetry  and  unity  lie  in  scientific  method. 
The  smallest  group  of  facts,  if  properly  classified  and  logi- 
cally dealt  with,  will  form  a  stone  which  has  its  proper 
place  in  the  great  building  of  knowledge,  wholly  independent 
of  the  individual  workman  who  has  shaped  it.  Even  when 
two  men  work  unwittingly  at  the  same  stone  they  will  but 
modify  and  correct  each  other's  angles.  In  the  face  of  all 
this  enormous  progress  of  modern  science,  when  in  all  civ- 
ilized lands  men  are  applying  the  scientific  method  to  natural, 
historical,  and  mental  facts,  we  have  yet  to  admit  that  the 
goal  of  science  is  and  must  be  infinitely  distant. 

For  we  must  note  that  when  from  a  sufficient  if  partial 
classification  of  facts  a  simple  principle  has  been  discovered 
which  describes  the  relationship  and  sequences  of  any  group, 
then  this  principle  or  law  itself  generally  leads  to  the  dis- 
covery of  a  still  wider  range  of  hitherto  unregarded  phe- 
nomena in  the  same  or  associated  fields.  Every  great 
advance  of  science  opens  our  eyes  to  facts  which  we  had 
failed  before  to  observe,  and  makes  new  demands  on  our 
powers  of  interpretation.  This  extension  of  the  material 
of  science  into  regions  where  our  great-grandfathers  could 
see  nothing  at  all,  or  where  they  would  have  declared  human 
knowledge  impossible,  is  one  of  the  most  remarkable  features 
of  modern  progress.  Where  they  interpreted  the  motion 
of  the  planets  of  our  own  system,  we  discuss  the  chemical 
constitution  of  stars,  many  of  which  did  not  exist  for  them, 
for  their  telescopes  could  not  reach  them.  Where  they  dis- 
covered the  circulation  of  the  blood  we  see  the  physical 
conflict  of  living  poisons  within  the  blood  whose  battles 
would  have  been  absurdities  for  them.  ^Vhere  they  found 
void  and  probably  demonstrated  to  their  own  satisfaction 
that  there  was  void,  we  conceive  great  systems  in  rapid  mo- 
tion capable  of  carrying  energy  through  brick  walls  as  fight 


THE  MEANING  OF  STATISTICAL  METHODS        13 

passes  through  glass.  Great  as  the  advance  of  scientific 
knowledge  has  been,  it  has  not  been  greater  than  the  growth 
of  the  material  to  be  dealt  with.  The  goal  of  science  is 
clear  —  it  is  nothing  short  of  the  complete  interpretation 
of  the  universe.  But  the  goal  is  an  ideal  one  —  it  marks 
the  direction  in  which  we  move  and  strive,  but  never  a  stage 
we  shall  actually  reach.  The  universe  grows  ever  larger 
as  we  learn  to  understand  more  of  our  own  corner  of  it. 

REVIEW 

1.  How  does  Pearson  sum  up  the  essence  of  modern  science? 

2.  What  is  the  test  which  he  applies  to  determine  whether  a 
social  institution  should  be  encouraged?  Do  you  think  he  has  in 
mind  "modern  business"  as  a  social  institution? 

3.  Why  stimulate  the  development  of  the  scientific  method? 
What  does  Pearson  mean  by  citizenship?  Would  his  reasoning 
apply  to  business  methods  and  economic  dealings  as  well  as  to 
those  which  are  political?     Show  why  or  why  not. 

4.  What  standards  does  he  use  to  distinguish  good  and  bad 
science  ? 

5.  Some  one  has  said  that  "  scientific  method  is  the  method  of 
noting  and  classifying  differences."  What  is  meant  by  this  state- 
ment ?     Does  this  point  of  view  correspond  to  Pearson's  ? 

6.  How  is  the  scientific  method  a  "habit  of  mind"  ?  What  does 
Pearson  mean  by  saying  "  The  unity  of  all  science  consists  alone 
in  its  method,  not  in  its  material "?  Is  there  a  similar  unity  in  all 
business,  as  in  "business  organizations,"  "personnel  administra- 
tion," "market  problems"? 

7.  What  business  and  economic  conflicts  can  you  suggest,  the 
solution  of  which  calls  for  the  application  of  scientific  method? 

8.  Apply  the  point  of  view  suggested  by  Pearson  to  such  prob- 
lems as  the  cost  of  living ;  increase  in  fares  on  street  railways  in 
view  of  increased  operating  expenses ;  advance  in  the  selling  price 
of  a  competitive  good  because  of  increase  in  cost  of  production ;  in 
marketing. 


14  STATISTICAL  METHODS 

Why  Statistics  and  its  Methods?  ^ 

Probably  at  no  other  period  have  statistics  played  so 
large  a  part  in  our  problems.  We  have  continuous  sectional 
enumerations.  We  have  a  decennial  survey  of  the  whole 
country.  We  make  studies  of  feeble-mindedness,  of  edu- 
cational possibiHties,  of  industrial  output.  We  are  making 
a  physical  valuation  of  the  railroads.  We  have  taken 
to  heart  the  lesson  Bagehot  taught  in  his  "Lombard  Street," 
and  the  Federal  Reserve  Board  does  for  American  bank- 
ing the  work  that  he  planned  for  English  concerns.  Yet 
fundamental  questions  still  go  unanswered.  We  are  con- 
tent with  tabulation  rather  than  analysis.  We  enumerate 
where  we  should  interpret.  In  the  result,  in  any  crisis  — 
such  as  the  present  railroad  situation  —  where  figures 
are  involved  we  have  no  means  of  interpreting  at  all  ade- 
quately their  significance. 

The  movement  for  the  eight-hour  day  is  asserted  by 
manufacturers  to  be  prohibitive  in  its  cost.  We  have  no 
means  at  our  disposal  of  checking  that  assertion.  Our 
statistics  seem  to  have  been  gathered  for  every  purpose 
save  that  of  getting  answers  to  the  basic  questions.  We 
have  no  means  of  checking  the  relation  between  popula- 
tion and  the  means  of  subsistence.  That  which  most  pain- 
fully arrests  our  attempts  at  progress  is  the  absence  of  im- 
personal record.  The  demand  for  the  abohtion  of  child 
labor  was  postponed  for  years  by  our  fear  of  the  formidable 
bill  of  costs  presented  to  us  by  business  men.  We  were 
first  told  that  child  labor  was  the  price  we  had  to  pay  for 
the  continuance  of  certain  industries.  Then  when  differ- 
ent states  passed  child  labor  laws  we  were  informed  that 
the  backward  states  could  not  compete  with  those  more 
1  Taken  with  permission  from  The  New  Republic,  August  26,  1916. 


THE   MEANING   OF   STATISTICAL  METHODS        15 

highly  organized.  We  felt  the  wrongness  of  these  argu- 
ments. We  did  not  put  any  faith  in  the  statistics  pre- 
sented for  our  consumption.  We  could  only  plead  the 
virtue  of  experiment,  and  sneer  into  extinction  the  habitual 
conservatism  of  the  business  man.  Yet  all  the  time  we 
dimly  realized  that  we  must  pay  a  large  price  for  our  faith. 
We  could  not  but  wonder  if  there  was  not  a  better,  a  more 
adequate  way. 

As  a  fact,  a  better  way  exists.  The  devil  can  cite 
statistics  for  his  purpose,  so  that,  for  ordinary  men 
and  women,  they  have  been  tainted  with  the  suspicion  that 
clings  to  his  usage  of  them.  The  last  twenty-five  years 
have  seen  a  revolution  in  statistical  method.  Enumera- 
tion has  given  way  to  critical  analysis.  Under  the  brilliant 
leadership  of  Professor  Karl  Pearson  there  has  been  evolved  a 
new  social  calculus  of  which  the  first  fruits  even  are  of  strik- 
ing importance.  We  have  already  seen  valuable  results 
in  the  study  of  education.  What,  for  example,  is  the  worth 
of  the  teacher's  estimate  of  his  pupils'  ability?  It  is  clearly 
fundamental  to  have  a  solution  to  such  a  problem  and  Pro- 
fessor Pearson  has  given  us  a  response  in  definitely  meas- 
urable terms.  Or  turn  to  social  disease.  We  require  to 
know  what  is  the  actual  worth  of  our  sanatorium  treat- 
ment of  tuberculosis.  Is  the  average  length  of  life  of  those 
who  are  returned  as  cured  to  the  general  population  the 
same  as  that  of  the  normal  healthy  man?  Theory  clearly 
requires  an  affirmative  answer  ;  the  result,  as  Professor  Pear- 
son has  shown,  is  in  fact  different,  so  that  we  begin  to  under- 
stand that  the  fundamental  problem  is  here  the  diathesis 
and  that  it  is  upon  its  understanding  that  our  attention 
must  concentrate.  So,  too,  in  the  problem  of  wages.  We 
require  a  means  of  interpreting  the  means  of  life  in  terms 
of  every  social  relation  that  is  of  communal  importance. 


16  STATISTICAL  METHODS 

What,  for  example,  is  the  cycUc  relation  of  wage-move- 
ments to  rent?  What  is  the  relation  of  rents  to  size  of 
family?  What  is  the  relation  of  food  prices  to  rent?  Does 
a  decrease  in  the  cost  of  food  result  in  a  movement  towards 
more  satisfactory  housing?  Or  take  the  problem  of  infant 
mortaUty.  We  are  too  easily  satisfied  with  its  interpreta- 
tion in  terms  either  of  the  mother's  employment  or  con- 
ditions of  bad  en\aronment.  Modern  methods  of  sta- 
tistics enable  us  to  go  a  step  further.  We  find,  for  instance, 
that  the  wiie  works  because  her  husband  has  low  wages. 
We  find  that  her  husband  has  low  wages  because  he  works 
in  a  poorly  paid  trade  entrance  into  which  is  the  result 
either  of  bad  physique  or  poor  intelUgence.  The  single 
problem  of  infant  mortahty  is  thus  in  fact  seen  to  involve 
the  whole  circle  of  economic  disharmonies.  A  beginning 
in  the  required  direction  is  shown  by  Miss  Lathrop's  im- 
portant reports  from  the  Children's  Bureau  of  the  Depart- 
ment of  Labor.  Mr.  Goring's  great  work  on  criminology, 
Miss  Elderton's  studies  of  alcoholism,  Miss  Harrington's 
on  eye-sight  only  repeat  the  same  results  in  different  form. 

They  present  conclusions  from  which  there  is  no  escape. 
The  fundamental  business  is  to  measure  the  quality  of  in- 
heritance in  terms  of  the  quality  of  en\dronment.  For 
that  end  we  need  a  census  survey  which  is  not  intrusted 
merely  to  competent  Democrats  or  trustworthy  RepubU- 
cans.  It  must  be  a  survey  in  which  medical  men,  statis- 
ticians, industrial  experts,  educators,  all  obtain  rep- 
resentation. And  it  must  be  emphasized  that  the  old 
statistics  are  out  of  date.  We  need  the  apphcation  to  our 
data  of  the  newest  instruments  at  our  service.  Interesting 
as  our  surveys  like  that  of  Pittsburgh  are,  they  have  the 
fundamental  defect  of  lack  of  precision.  The  social  worker 
who  has  impressions  to  record  must  record  them  to-day 


THE   MEANING   OF  STATISTICAL   METHODS        17 

in  such  form  as  admits  of  statistical  treatment.  We  have 
IDassed  beyond  the  stage  where  quaUtative  description  is 
possible.  Here,  as  elsewhere,  it  is  in  quantitative  expres- 
sion only  can  we  place  any  confidence.  The  ideal  type- 
survey  is  the  Report  on  the  Physical  and  Mental  Condi- 
tion of  Edinburgh  School  Children  prepared  by  the  Charity 
Organization  Society  of  that  city.  We  can  measure  there 
exactly  those  qualities  of  which  we  desire  to  know  the  re- 
sult in  social  practice.  What  is  the  effect  of  parental  al- 
coholism on  the  health  of  the  children?  How  far  does  it 
affect  wages?  What  harm  does  industrial  instability  do 
to  the  attendance  of  the  child  at  school?  How  far  does  a 
dirty  home  affect  the  intelligence  of  the  child?  All  these 
questions  can  be  given  a  partial  answer  from  the  Edinburgh 
report.  But  we  want  to  check  Edinburgh  by  London  and 
London  by  New  York.  We  want  a  study  of  Chicago,  of 
Atlantic,  of  St.  Louis.  It  is  upon  knowledge  made  definite 
and  measurable  that  the  advance  of  the  future  will  be  se- 
cured. J 
Beginning  with  the  great  rate  inquiries  of  1910  Mr. 
Brandeis  made  earnest  pleas  for  the  establishment  of  a  Bureau 
of  Cost  Accounting.  We  are  paying  the  price  now  for  our 
failure  to  take  proper  advantage  of  his  counsel.  Nothing 
would  have  contributed  more  to  our  understanding  of  the 
railroad  situation  than  the  ability  to  compare,  item  by  item, 
the  method  and  cost  of  operation  of  each  railroad  in  the 
country.  We  could  have  thus  obtained  a  kind  of  composite 
portrait  of  conditions  which  would  have  gone  far  to  re- 
move the  haze  and  dimness  of  our  present  uncertainty. 
We  could  have  known,  for  instance,  the  exact  way  in  which 
the  Boston  and  Maine  Railroad  has  improved  its  earning 
capacity  relative  to  the  comparative  failure  of  the  New 
Haven  Road.     We  would  have  forecasted  means  of  improve- 


18  STATISTICAL  METHODS 

ment.  We  could  have  suggested  maximum  costs  of  output 
in  every  branch  of  railroad  operation.  If  thus  far  we  have 
failed  in  our  wisdom,  we  may  no  longer  wait  upon  the  event. 

We  want,  further,  studies  of  wage  situations  such  as  those 
wliich  Mr.  R.  H.  Tawney  is  making  of  the  industries  gov- 
erned by  the  Trade-Board  Act  of  Great  Britain.  Our 
own  students  are  too  prone,  in  similar  work,  to  describe 
methods  of  operation  and  give  statistics  of  output  as  the 
right  method  of  approach.  Mr.  Tawney's  method  is  re- 
freshingly different.  He  studies  in  definite  terms  the  work- 
ing of  his  industry.  He  explains  its  reaction  to  the  wages 
of  men  and  women,  to  prices  and  profits,  to  trade  unionism. 
He  studies  in  detail  its  effect  no  less  on  the  management 
of  industry  than  on  the  workers.  He  discusses  the  rela- 
tion of  minimum  rates  to  degree  and  security  of  employ- 
ment, and  to  home  work.  He  makes  evident  the  defects 
and  virtues  of  the  administration  of  minimum  rates.  A 
single,  brief  chapter  gives  us  all  we  require  to  know  of  the 
actual  method  by  which  the  industry  is  organized.  The 
study,  as  a  whole,  is  a  triumphant  vindication  of  the  prin- 
ciples underlying  the  demand  for  the  minimum  wage.  But 
it  is  a  vindication  almost  uniquely  valuable  in  industrial 
inquiry  in  that  its  conclusions  are  based  on  the  provision 
of  an  unimpeachable  bill  of  costs  which  is,  from  the  sta- 
tistical standpoint,  as  imaginatively  conceived  as  it  is  bril- 
liantly executed. 

We  in  America  can  be  satisfied  with  no  less  than  this. 
Under  the  Commerce  Clause  trade  is  in  the  hands  of  Con- 
gress, the  Interstate  Commerce  Commission,  the  Federal 
Trade  Commission,  both  of  these  having  their  statistical 
departments.  It  is  not  too  much  to  ask  that  the  methods 
they  apply  to  their  problems  be  such  as  are  most  likely  to 
provide  the  best  basis  for  public  judgment.     A  statistician 


THE   MEANING   OF  STATISTICAL  METHODS        19 

is  no  longer  a  clerk,  but  a  mathematician  who  has  special- 
ized in  the  theory  of  probability.  We  want  men  of  that 
kind  to  direct  our  inquiries.  We  want  all  those  engaged 
in  social  work  to  think  out  collectively  the  right  questions 
and  to  analyze  our  material  by  the  methods  which  alone 
give  promise  of  sufficient  response.  Statistics  is  no  longer 
a  matter  in  which  a  single  university  course  explaining 
the  means  by  which  an  average  is  calculated  is  really  ade- 
quate. What  we  need  attached  to  every  important  govern- 
ment department  and  every  great  university  is  a  statistical 
laboratory  such  as  that  of  which  Professor  Pearson  has  the 
direction  in  London  University.  We  shall  then  begin  to 
know  the  basis  upon  which  our  social  problems  really  rest. 
We  shall  then  have  satisfactory  demonstration  of  the  best 
lines  of  their  efl&cient  understanding. 

REVIEW 

1.  Sketch  the  bill  of  complaint  given  expression  to  in  "Why 
Statistics  and  Its  Methods."  Does  this  complaint  have  any  ap- 
plication in  the  business  with  which  you  are  connected  or  in  the 
statistical  activities  of  any  business  or  agency  with  which  you  are 
acquainted  ?     How  ? 

2.  Is  the  business  man  interested  in  these  larger  problems  out- 
side of  "  his "  business  ?     Why  ?     What  are  the  limits  of  his  business  ? 

3.  Can  such  a  problem  as  the  eight-hour  day  be  settled  by 
statistics?  Would  statistics  have  any  bearing  on  such  a  problem 
as  the  tariff?  Why?  On  the  establishment  of  a  wage  policy? 
How? 

4.  If  statistics  were  collected  solely  to  settle  "basic  questions" 
would  not  the  occasions  for  collecting  them  be  so  diverse  that  little 
general  knowledge  would  be  obtained?  Would  such  statistics 
meet  the  day-to-day  needs  of  business  men,  students  of  economic 
and  social  conditions?     Illustrate  why  or  why  not. 

5.  Which  seems  to  you  preferable  on  the  part  of  governmental 
statistical  agencies,  (a)  solely  to  collect  "general  purpose"  statis- 


20  STATISTICAL  METHODS 

tics,  or  (6)  solely  to  collect  "special  purpose"  statistics?  Can 
there  be  a  combination  of  both?  If  so,  who  should  determine  the 
kinds  of  activities  to  which  the  statistics  apply?  Name  some 
general  purpose  statistics  which  are  collected,  which  are  of  interest 
to  a  business  man  as  a  business  man,  and  to  him  solely  as  a  citizen. 
Can  you  think  of  any  collected  which  seem  primarily  to  answer  his 
problems?  Consult  the  following  statistical  publications  with 
these  points  in  mind :  The  Census  of  Manufactures,  U.  S.  Census ; 
Monthly  Summary  of  Commerce  and  Finance;  Bradstreets ;  The 
Statistical  Abstract,  U.  S.  Department  of  Commerce ;    Iron  Age. 

6.  How  do  you  explain  the  attitude  voiced  in  the  following  state- 
ment? "We  cry  aloud  for  facts;  there  is  a  voracious  and  undis- 
criminating  appetite  for  figures,  or  rather  for  the  nourishment 
they  afford  to  argument  and  propaganda ;  statesmen,  teachers, 
preachers,  publicists,  and  men  in  the  street  exemplify  it.  It  is  a 
dyspeptic  appetite,  if  you  please,  because  of  the  ill-assorted  wares 
upon  which  it  feeds.  On  the  other  hand,  there  is  an  almost  equally 
common  and  more  or  less  outspoken  distrust  of  statistics  or  the 
widespread  application  of  the  statistical  method  as  a  means  of 
obtaining  working  knowledge."  What  is  to  be  done  about  this? 
Wherein  does  the  difficulty  lie? 

7.  A  contrast  has  been  drawn  between  what  is  called  "  statistical 
foresight"  and  "  statistical  hindsight."  What  does  such  a  contrast 
mean  to  you?  Is  this  distinction  identical  with  that  between 
"statistical  planning"  and  "statistical  planlessness ? "  Would 
statistical  planning  in  your  judgment  largely  correct  the  condition 
described  in  question  6?  ' 

Statistical  Control  Including  Costs  as  a  Factor 

IN  Production  ^ 

General.  —  A  manager  desiring  to  determine  the  best 
place  at  which  to  locate  a  particular  type  of  retail  store, 
considers  possible  locations  from  many  points  of  view,  in- 

1  See  W.  C.  Mitchell,  "Statistics  and  Government,"  in  Quarterly  Publica- 
tions of  the  American  Statistical  Association,  March,  1919,  pp.  223—235. 

*  Adapted  with  permission  from  Person,  Harlow  S.,  The  Annals  of  the 
American  Academy  of  Political  and  Social  Science,  September,  1919,  pp. 
220-230. 


THE   MEANING   OF   STATISTICAL  METHODS       21 

eluding  casual  observations  of  the  places  where  the  great- 
est number  of  possible  customers  seem  to  pass.  He  then 
stations  at  each  of  these  places  an  observer  who,  in  a  square 
on  a  tally-sheet  ruled  in  a  carefully  predetermined  manner, 
makes  a  mark  as  each  person  passes.  After  the  obser- 
vations have  been  completed  and  the  marks  in  the  various 
squares  are  counted,  the  manager  is  enabled  to  establish 
a  number  of  facts  pertinent  to  the  problem  such  as  the 
following :  the  average  number  of  persons  who  pass  during 
a  day ;  the  average  who  pass  each  hour  of  the  day ;  the 
average  number  of  men  who  pass  each  hour  of  the  day ; 
of  women ;  of  children ;  the  number  of  office  girls  who 
pass  during  the  lunch  hour;  etc.  These  group  facts,  dis- 
covered by  recording  and  classifying  the  mass  of  unit  facts, 
are  of  importance  in  helping  him  to  decide  a  problem  of 
business  policy. 

If  a  merchant  sells  hats  for  a  season  and  keeps  no  record 
of  sizes  sold,  he  is  at  a  loss  to  place  precise  orders  for  the  next 
season.  He  may  have  a  general  impression  that  he  had 
better  place  in  stock  more  of  a  given  size  than  of  other 
sizes,  but  a  "general  impression"  is  not  precision,  control 
and  economy  in  operation.  On  the  other  hand,  if  he  has 
kept  records,  he  may  find  he  has  sold  50  size  6^;  150  size 
6f ;  300  size  7  ;  500  size  7^ ;  400  size  7^ ;  150  size  7|- ;  etc. 
—  in  all  some  1600  hats.  He  estimates  that  his  sales  will 
amount  to  2000  hats  next  season  and  divides  the  order  for 
that  number  in  the  ratios  with  respect  to  size,  of  .5,  1.5,  3, 
5, 4, 1.5,  etc.,  and  feels  certain  that  he  is  forecasting  his  market 
with  precision. 

These  illustrations  should  suggest  to  the  reader  the 
nature,  the  purpose  and  the  methods  of  statistics  in  busi- 
ness. An  illustration  might  have  been  used  in  which  facts 
are  entered  on  "forms"  in  an  office,  as  documents  result- 


22  STATISTICAL  METHODS 

ing  from  operations  and  carrying  different  kinds  of  data 
(units  of  product ;  wages ;  sales ;  complaints ;  prices  of 
materials ;  etc.)  pass  through  the  office.  The  magnitude 
of  the  business,  the  volume  of  the  data,  the  number  ob- 
served, recorded,  classified,  compared,  and  otherwise  handled, 
make  no  difference. 

Nature  and  Purpose  of  Statistics.  —  A  "fact, "  the  relations 
of  which  are  obscured,  has  little  or  no  significance.  A  single 
person  passing  the  observer  in  the  first  illustration  has  no 
meaning  or  importance.  Related  to  the  problem  of  locat- 
ing the  store  he  begins  to  assume  importance.  Related 
to  that  problem  as  one  person  in  an  aggregate  of  persons 
passing  the  observer,  he  becomes  in  this  relationship  of 
great  importance;  but  by  becoming  part  of  an  aggregate 
of  persons  he  is  transformed  into  one  of  a  mass  of  data  so 
numerous  as  to  confuse  the  mind,  which  is  limited  in  its 
processes  of  observing,  valuing,  remembering,  and  compar- 
ing separate  experiences  which  come  to  it  casually.  The 
mind  is  unable  to  grasp  the  significance  of  larger  sum- 
marizing facts  behind  or  contained  in  the  mass. 

Yet  there  are  summarizing  facts  there,  facts  which  result 
from  the  bringing  together  an  analysis  of  the  aggregate. 
Statistics  is  the  science  and  the  art  of  handling  aggregates 
of  facts  —  observing,  enumerating,  recording,  classifying, 
and  otherwise  systematically  treating  them  —  so  that 
other  "master"  facts  or  principles  or  laws  lying  behind 
or  contained  in  the  aggregate  are  made  comprehensible 
to  the  mind  and  become,  along  with  the  results  of  other 
methods  of  investigation,  data  for  reasoning,  the  drawing 
of  conclusions,  the  making  of  decisions,  and  the  determina- 
tion of  policy. 

Statistical  Methods.  —  There  have  been  developed  many 
devices   for    the   summarizing   and    analysis   of   statistical 


THE  MEANING  OF  STATISTICAL  METHODS       23 

data  such  as  the  per  cent  and  the  arithmetic  average.  No 
manager  of  a  plant  of  any  size,  for  instance,  could  carry  in 
his  head  the  number  of  hirings  and  separations  for  two  or 
three  years.  Yet  if  recorded  these  facts  can  be  classified 
and  summarized  through  the  medium  of  coefficients,  and 
the  mind  can  easily  reason  in  terms  of  the  coefficients,  which 
sum  up  group  facts  behind  the  unit  facts.  That  labor 
turnover  was  43  per  cent  in  1918,  and  27  per  cent  in  1919, 
is  the  statement  of  two  significant,  comprehensible,  sum- 
marizing facts  yielded  by  proper  treatment  of  a  large  num- 
ber of  accumulated  unit  facts,  which  considered  individ- 
ually had  relatively  little  significance.  The  business 
statistician  does  not  need,  in  the  present  stage  of  the  de- 
velopment of  the  art  of  statistics  in  business,  to  go  into 
such  refinements  of  statistical  method  as  are  necessary 
for,  let  us  say,  the  biologist,  or  even  a  department  of  public 
health.  Extreme  refinements  of  method  yield  only  a  fic- 
titious accuracy  when  the  preceding  steps  of  observation, 
enumeration,  and  classification  are  lacking  in  precision, 
or  the  data  are  not  in  great  volume,  which  is  usually  the 
case  in  a  business.  The  accuracy  of  a  chain  of  reasoning 
can  be  no  greater  than  its  weakest  link. 

But  if  refinement  of  method  in  the  mathematical  treat- 
ment of  data  is  unnecessary  in  the  use  of  statistics  in  busi- 
ness, too  great  care  cannot  be  exercised  with  respect  to  the 
collection  for  data.  The  summarized  data  become  prem- 
ises for  reasoning,  and  to  the  extent  that  they  have  been  in- 
correctly labeled  and  classified  in  the  process  of  collection 
and  recording,  the  reasoning  and  the  conclusions  of  which 
they  become  the  basis  are  unreliable.  The  skilled  stat- 
istician wants  to  know  how  the  data  were  collected  — 
are  they  complete  or  a  good  sample  of  the  mass  of  unit 
facts  under  consideration;   is  classification  exact;   are  com- 


24  STATISTICAL  METHODS 

pared  averages  the  averages  of  like  things;  etc.?  The 
critical  stage  in  statistical  investigation  is  the  first  stage ; 
the  determination  of  the  purpose  of  the  investigation ; 
precise  definitions  of  different  kinds  of  unit  facts  to  be  re- 
corded ;  the  careful  recording,  classification,  and  summariz- 
ing of  these  unit  facts  in  accordance  with  the  precise  def- 
initions. From  that  stage  on  statistical  processes  are 
simple.  It  is  in  that  stage  that  the  difficulties  lie  and  the 
errors  are  made. 

Homogeneity  of  Statistical  Units.  —  That  the  original 
units  of  observation  and  record  should  be  homoge- 
neous is  the  primary  rule  of  all  worth-while  statistical  effort. 
This  depends  upon  careful  definitions.  If  definitions  are 
not  exact,  dissimilar  things  will  be  enumerated  under  the 
same  head  by  different  observers  or  recorders,  homogeneity 
wall  not  exist,  and  the  summaries  and  averages  will  not 
be  comparable.  One  recorder  might  include  under  "wages" 
some  payments  that  another  includes  under  "salaries." 
One  might  include  under  "worked  materials"  some  things 
that  another  includes  under  "stores."  One  might  include 
in  the  length  of  time  it  takes  to  perform  an  operation,  the 
time  between  the  start  and  finish  of  the  operation  that 
the  machine  is  idle ;  another  might  not.  The  statistics 
of  labor  turnover  published  to-day  are  generally  incom- 
parable because  of  this  error.  In  one  plant  "separations" 
is  made  the  basis  of  computation;  in  another  "hirings." 
In  one  plant  the  worldng  force  may  be  increasing,  in  an- 
other decreasing;  neither  "separations"  nor  "hirings" 
has  the  same  significance  in  the  one  as  in  the  other.  Dif- 
ferent unit  facts  are  classified  under  the  same  head  and  the 
law  of  homogeneity  is  violated.  Resultant  averages  are 
not  comparable. 

The    primary    statistical    fact  —  statistical    unit  —  ob- 


THE   MEANING   OF  STATISTICAL  METHODS       25 

served  and  recorded  should  not  be  a  compound  fact.  To 
use  a  chemical  analogy,  it  should  be  an  element.  Com- 
pounds can  be  built  up,  if  desired,  by  bringing  elements 
together.  The  recording  and  analysis  of  homogeneous 
primary  facts  require  planning  ability  and  cost  money, 
but  they  are  the  only  facts  worth  recording.  Later  at- 
tention will  be  directed  to  the  use  of  mechanical  devices 
which  make  possible  the  recording  and  classifying  of  unit 
facts  at  a  reasonable  cost. 

Statistics  in  Business.  —  The  application  of  statistics 
was  first  developed  by  governments  and  quasi-public  in- 
stitutions in  the  study  of  social  phenomena  and  was  then 
developed  and  carried  to  the  highest  degree  of  perfection 
in  technical  method  by  the  biologists  in  the  study  of  the 
laws  of  heredity.  In  these  fields  the  data  have  always 
been  so  numerous  as  to  compel  statistical  treatment,  and 
in  these  fields  great  discoveries  have  been  made  by  the 
statistical  method  of  investigation.  Among  business  in- 
stitutions the  first  to  use  statistical  methods  were  the  in- 
surance companies,  railroads,  and  similar  businesses,  the  data 
of  whose  operations  are  voluminous  and  usable  only  when 
statistically  handled.  With  the  broadening  of  markets 
and  the  increase  in  the  size  and  in  the  volume  of  business 
of  other  industrial  institutions,  the  use  of  statistics  in- 
creased as  an  aid  in  estabUshing  standards,  and  in  interpret- 
ing facts  as  a  basis  for  the  forecasting  of  tendencies  and  the 
determination  of  policies.  To-day  there  are  few  large 
business  institutions  in  the  United  States  —  manufactur- 
ing or  distributive  —  which  do  not  have  statistical  de- 
partments, and  regard  for  the  statistical  function  in  smaller 
institutions  is  increasing  with  great  rapidity.  There  is 
scarcely  a  business  of  any  size  which  could  not  use  sta- 
tistics to   advantage,   the  size  of  the   "statistical  depart- 


26  STATISTICAL  METHODS 

merit"  being  purely  a  problem  in  overhead  cost  to  be  viewed 
in  the  light  of  probable  returns.  There  is  an  advertis- 
ing company  which  carries  an  immense  and  costly  sta- 
tistical overhead,  but  the  result  of  the  work  of  that  de- 
partment has  made  the  company  impregnable  in  com- 
petition ;  its  clients  have  confidence  in  its  advice.  I  know 
of  a  small  distributing  house  in  which  a  young  graduate 
of  a  school  of  business  administration,  along  with  other 
duties  and  on  his  own  initiative,  began  to  record,  classify, 
and  analyze  data  according  to  the  statistical  method.  In 
one  year  he  proved  "master  facts  behind  the  mass  of  unit 
facts"  never  before  observed,  and  influenced  purchase 
policy  and  sales  policy  —  for  the  business  he  effected  econ- 
omies resulting  from  operations  in  accordance  with  better 
policies,  and,  for  himself,  proved  himself  worthy  to  be  a 
branch  manager.  Between  these  two  extremes  may  be 
found  throughout  business  a  great  variety  of  methods  of 
utilizing  statistics  in  investigation. 

The  Practical  Objects  of  Statistics  in  Business.  —  The  prin- 
cipal objectives  of  the  use  of  statistics  in  business  are  : 

1.  To  ascertain  inner,  controlling,  master  facts  which 
cannot  be  ascertained  by  casual  observation  of  the  complex 
mass  of  obvious  facts  which  constitute  the  experience  of 
the  business  and  in  which  they  are  contained.  The  sales 
manager  about  to  undertake  a  sales  campaign,  does  not 
trust  to  chance  or  to  casual  observation  more  than  is  nec- 
essary. He  investigates  and  analyzes  characteristics  of 
the  consuming  public  in  a  market  —  estimates  among  other 
things  their  probable  demand  for  and  capacity  to  purchase 
the  particular  commodity  he  proposes  to  introduce,  and 
the  kind  of  advertising  methods  to  which  the  purchasers 
of  that  market  are  most  likely  to  react.  The  utility  corpo- 
ration analyzes  statistically  a  growing  suburb,   before  it 


THE   MEANING   OF  STATISTICAL   METHODS        27 

determines  its  policy  of  extension  and  capital  investment. 
The  merchandise  and  credit  managers  of  a  wholesale  dis- 
tributing house  estimate  the  purchasing  power  of  a  region, 
through  the  statistical  analysis  of  crop  and  other  governing 
conditions,  before  determining  policy  with  respect  to  a  sea- 
son's business.  The  manager  of  a  retail  store  may  analyze 
sales  of  different  articles  by  sizes,  seasons,  etc.,  in  order 
to  determine  a  quality,  quantity,  and  seasonal  schedule 
of  purchases,  thereby  adjusting  orders  to  probable  turn- 
over. 

2.  To  determine  standards  by  which  to  value  and  guide 
current  performance  and  in  terms  of  which  to  estimate 
future  performance.  The  merchandise  manager  of  a  de- 
partment store  receives  each  morning  a  summary  sheet 
showing  sales  of  the  preceding  clay  compared  with  sales 
of  the  same  day  the  year  before ;  cumulative  sales  of  the 
month  to  date  compared  with  cumulative  sales  of  the  cor- 
responding period  of  the  year  before  and  with  estimates 
for  the  current  month ;  cumulative  sales  of  the  year  to  date 
compared  with  those  of  the  corresponding  year  before, 
and  so  on.  He  can  ascertain  at  a  glance  whether  sales 
are  going  well ;  if  they  are  not  he  may  institute  at  once  a 
special  sales  campaign.  Likewise  any  business  selling  com- 
modities or  services.  A  production  manager  time-studies 
operations  under  different  conditions  and  with  different 
materials  and  methods,  and  by  statistical  treatment  of  the 
data  establishes  several  standards  :  standards  of  conditions ; 
of  materials ;  of  methods ;  of  performance.  He  can  then 
value  and  guide  current  performance  and  can  estimate  with 
precision  future  performance.  He  may  keep  his  record  in 
terms  of  units  of  output  and  in  terms  of  units  of  cost. 
Cost  units  are  no  different  from  other  units  in  statistical 
treatment.    A  telephone  company  analyzes  statistical  records 


28  STATISTICAL  METHODS 

of  calls  and  establishes  a  standard  of  performance  for  an 
operator  or  for  a  system  and  on  the  basis  of  these  stand- 
ards can  determine  whether  an  operator  is  efficient  or  a 
system  is  approaching  the  volume  of  business  for  which  it 
will  be  inadequate,  requiring  extension  or  replacement. 
The  electric  light  or  telephone  or  other  similar  company, 
by  statistical  records  determines  the  hours,  the  days,  and 
other  seasons  when  its  various  peak  loads  are  bound  to 
occur,  and  establishes  operating  policy  accordingly.  A 
supply  division  of  the  army  or  navy,  by  statistical  methods, 
determines  a  procurement  and  delivery  schedule  for  an  army 
of  a  given  size  under  predetermined  conditions  of  activity, 
and  by  similar  statistical  methods  determines  from  day 
to  day  whether  the  schedule  is  being  observed. 

The  use  of  statistics  in  determining  such  standards  for 
measuring  current  performance  and  estimating  future 
performance  is  one  of  the  latest  developments  of  the  use  of 
statistics  in  business,  offers  one  of  the  most  profitable  in- 
struments for  improvement  in  managerial  methods,  and 
unfortunately  involves  some  of  the  greatest  dangers  of  mis- 
use. These  misuses  are  prevalent  in  current  practice. 
The  first  is  the  error  of  so  organizing  the  function  of  re- 
cording, classifying,  and  analyzing  data  as  to  secure  the 
returns  too  late  for  use  in  controlling  current  operations, 
in  which  case  the  statistics  are  but  records  of  past  per- 
formance and  have  so  limited  a  usefulness  as  to  raise  the 
question  whether  they  are  worth  the  cost  of  collection. 
The  second  error  is  that  the  units  of  enumeration  may  not 
be  homogeneous,  and  to  the  extent  that  they  are  not,  their 
value  in  the  control  of  current  practice  or  of  forecasting 
future  performance  is  invalidated.  A  time  (statistical 
unit)  of  a  performance  by  method  A  under  condition  B 
with  material  C  on  machine  D  is  not  homogeneous  with  a 


THE   MEANING   OF   STATISTICAL  METHODS        29 

time  resulting  from  a  study  when  either  A,  B,  C,  or  D  is 
different.  Three  complaints,  one  resulting  from  disturbed 
mail  service,  one  resulting  from  a  defect  in  the  goods,  and 
one  resulting  from  discourtesy  of  a  clerk,  are  not  homo- 
geneous. To  record  them  simply  as  "complaints"  may 
enable  a  manager  to  enjoy  the  sensation  that  something 
is  wrong,  but  will  give  no  precise  information  which  will 
enable  him  to  control  the  situation  and  remedy  the  causes. 
The  third  error  in  the  use  of  statistics  in  establishing  stand- 
ards and  measuring  performance  is  that  the  units  of  sta- 
tistical record  may  not  correspond  to  the  units  of  the  op- 
erating processes.  This  is  a  common  error,  for  only  too 
frequently  the  statistical  function  is  not  recognized  as  a 
production  function,  and  the  statistical  department  and 
methods  are  developed  independently  of  the  production 
department  and  methods.  The  analysis  of  processes  by  the 
production  manager  for  the  purposes  of  operating  con- 
trol is  different  from  that  of  the  statistical  department 
for  purposes  of  record,  with  the  result  that  the  statistics 
fail  to  be  useful  to  the  production  manager.  The  same 
authority  that  approves  the  establishment  of  production 
methods  should  approve  the  establishment  of  statistical 
methods  in  so  far  as  they  are  concerned  with  statistics  of 
operation,  in  order  to  insure  that  the  units  of  statistical 
record  shall  be  identical  with  the  unit  process  of  production. 
Furthermore,  the  only  way  of  assuring  such  correspondence 
is  to  make  the  "papers"  which  control  production  the  orig- 
inal documents  from  which  statistical  data  are  drawn. 

3.  To  estabUsh  series  of  facts  which  suggest  tendencies, 
or  permit  comparisons  which  suggest  causal  relations,  or 
at  least  correlation,  between  series.  Time  curves  may  be 
plotted  showing  sales  —  by  salesmen,  by  territories,  by 
articles,   etc.     By  these  the  sales  manager  may  keep  in- 


30  STATISTICAL  METHODS 

formed  concerning  the  sales  tendency  in  a  territory,  of  a 
commodity,  or  of  a  salesman.  Comparison  of  these  curves 
may  permit  the  manager  to  determine  that  the  salesman 
whose  record  of  gain  is  best  is  concentrating  on  leaders 
which  yield  small  profit  while  a  salesman  whose  record 
for  gain  is  not  so  good  may  be  selling  a  wider  variety  of 
articles,  thereby  laying  the  foundation  of  a  better  long- 
run  business  in  his  territory.  Curves  of  wages  paid,  hours 
of  work,  output  per  man,  separations,  hirings,  cases  of 
discipline,  idle  machine  time,  etc.,  may  be  compared  and 
correlations  proved  —  i.e.  it  may  be  observed  that  when 
one  curve  shows  a  particular  tendency  another  shows  a 
similar  or  different  particular  tendency.  The  establishment 
of  such  correlations  permits  more  accurate  forecasting 
of  results  and  the  establishment  of  more  dependable  policies. 
There  is  opportunity  for  the  development  of  statistics  of 
this  kind  in  every  business  and  the  results  may  be  con- 
siderable, but  in  no  two  businesses  is  it  the  same,  and  each 
is  a  field  for  special  study. 

There  are  many  data  pertaining  to  the  social-industrial 
conditions  in  which  a  business  is  carried  on,  of  importance 
to  every  manager  in  determining  policy,  but  to  collect, 
classify,  and  analyze  these  would  be  too  great  a  burden  of 
cost  for  one  business.  We  have  in  mind  data  relating  to 
crop  conditions,  prices  of  basic  materials  of  industry,  bank 
clearings,  commercial  failures,  etc.,  which  when  consoli- 
dated and  compared  throw  light  on  general  business  condi- 
tions. Statistics  of  this  sort  are  now  available  through 
statistical  service  agencies,  and  it  is  not  necessary  for  the 
individual  business  to  secure  them.  But  there  remains  a 
considerable  number  of  special  "lines"  of  statistics,  es- 
pecially pertinent  to  its  materials,  products,  and  markets, 
which  a  business  may  profitably  maintain. 


THE  MEANING  OF  STATISTICAL  METHODS       31 

4,  To  determine  laws  governing  industrial  operations. 
A  comparison  of  different  lines  of  statistics  might  disclose 
such  relations  as  to  prove  principles  to  which  the  term 
"law"  could  properly  be  applied.  Extraordinarily  large 
numbers  of  homogeneous  data  are  essential  to  the  estab- 
lishment of  laws.  These  are  seldom  available  in  the  records 
of  a  single  industrial  concern.  The  most  noteworthy  case 
of  the  scientifically  precise  observation,  recording,  classi- 
fication, analysis,  and  general  statistical  treatment  of  indus- 
trial data  which  has  led  to  the  formulation  of  laws,  was  the 
study  by  Mr.  Taylor  and  his  associates  which  led  to  the 
discovery  of  the  laws  of  metal-cutting,  which  revolutionized 
that  art.  The  hope  of  the  discovery  of  laws  governing 
industrial  operations  depends  upon  the  poohng  of  the  sta- 
tistical interests  of  many  concerns  —  cooperative  sta- 
tistics which  will  yield  homogeneous  data  in  great  volume. 

Cost  Accounting.  —  Cost  accounting  is  a  specialized 
phase  of  statistics.  It  is  statistics  in  which  the  statistical 
units  are  monetary  values  —  cents,  pence,  centimes.  The 
principles  of  the  statistical  treatment  of  these  units  are  no 
different  from  the  principles  of  the  treatment  of  other 
units  —  pounds,  gallons,  bushels.  Cost  statistics  are  sub- 
ject to  every  law  governing  general  statistics,  and  most  of 
the  troubles  in  cost  accounting  are  the  result  of  disregard 
of  statistical  laws.  Cost  statistics  should  be  derived  from 
operating  "papers";  these  papers  should  flow  in  a  con- 
stant stream  over  the  desks  of  cost  and  other  statistical 
clerks  and  keep  the  record  "up  to  the  day"  as  a  basis  for 
immediate  control  of  operations ;  the  cost  unit  data  should 
coincide  with  or  dovetail  into  the  unit  data  of  other  phases 
of  statistics ;  they  should  be  homogeneous.  Costs  which 
have  been  derived  in  accordance  with  these  principles  are 
worth  the  expense;   costs  which  are  but  the  record  of  past 


32  STATISTICAL  METHODS 

events  —  records  got  up  too  late  to  influence  current  action 
and  in  classes  which  do  not  correspond  to  classes  of  opera- 
tions in  the  shop  —  are  seldom  worth  the  expense  of  col- 
lection. 

Mechanical  Devices.  —  The  principal  obstacle  which  is 
met  in  the  development  of  the  cost  and  general  statistical 
methods  here  recommended  is  the  clerical  expense  involved. 
The  expense  of  copying  data  from  operating  forms  on  to 
special  statistical  department  forms,  and  then  of  computa- 
tions and  tabulation,  is  frequently  prohibitive.  But  it  is 
possible  to  adapt  the  cards  of  the  standard  sorting  and 
tabulating  machines  for  use  as  original  operation  orders, 
and  they  become,  after  their  use  in  operation,  the  data 
cards  of  the  cost  and  other  statistical  clerks.  One  firm  at 
least  has  economically  secured  extraordinary  results  in  this 
way.  The  economy  resulting  from  the  use  of  mechanical 
devices  and  the  exceptional  minuteness  and  value  of  the 
costs  and  other  statistics  derived  by  this  firm,  are  due  to 
the  fact  that  the  statistical  methods  are  tied  up  with  — 
are  a  function  of  —  the  good  management  methods. 

Graphical  Records.  —  Graphical  forms  of  recording  sta- 
tistical data  —  especially  summarizing  data  —  have  been 
found  desirable  by  all  well-organized  statistical  depart- 
ments. The  simple  curve  is  the  most  useful  graphical 
device.  It  has  properties,  not  characteristic  of  tables, 
which  aid  the  mind  in  detecting,  through  the  eye,  tend- 
encies and  relations.  There  are  firms  which  plot  and 
keep  posted  daily  as  many  as  1500  or  2000  curves. 

The  Statistical  Department.  —  The  statistical  function 
should  be  performed  by  specialized  clerks  trained  in  the 
methods  and  in  the  manipulation  of  mechanical  devices 
and  in  statistical  operations.  The  manager  of  the  depart- 
ment should  be  above  all  a   man  of  imagination   and  of 


THE   MEANING  OF  STATISTICAL  METHODS       33 

analytical  ability.  He  should  suggest,  but  he  alone  should 
not  determine  what  statistics  should  be  kept  and  what 
objectives  aimed  at.  Statistics  are  for  use,  not  for  file. 
The  executive  and  the  administrative  officers  are  the  users. 
They  should  participate  in  determining  what  statistics 
should  be  kept.  Their  several  desires  should  be  dovetailed 
into  one  organic  body  of  statistical  records,  coordinated 
by  the  general  manager  through  the  management  engineer. 

General  1 72 formation :  A  Supplementanj  Function.  — 
Statistics  is  a  method  of  investigation,  of  securing  informa- 
tion. It  is  logical  therefore  that  other  methods  of  secur- 
ing information  than  the  statistical  should  be  assumed 
by  the  statistical  department.  Special  libraries,  including 
files  of  books,  pamphlets,  trade  periodicals,  and  newspaper 
clippings,  of  which  all  the  important  contents  bearing  on 
the  business  are  indexed,  are  being  developed  by  statisti- 
cal departments.  The  department  should  take  the  initi- 
ative in  bringing  pertinent  information  to  the  attention 
of  the  administrative  and  executive  officers ;  should  es- 
tabfish  its  information  service  within  the  plant. 

Conclusion.  —  Statistical  results  secured  in  accordance 
with  correct  statistical  principles  and  methods  —  related 
to  operations,  posted  up  to  the  day,  based  on  homogeneous 
units  —  are  as  important  to  the  well-managed  manufactur- 
ing plant  as  are  the  sextant  and  the  compass  to  the  mariner. 
They  permit  the  management  to  know  at  any  moment 
where  it  is  and  to  set  its  course.  Statistics  which  yield 
only  records  of  past  events  are  of  no  more  use  than  the 
log  to  the  mariner ;  they  do  not  assist  one  to  shape  one's 
course.  Statistics  are  recorded,  classified,  and  analyzed 
experience.  From  this  experience,  so  made  available,  prin- 
ciples may  be  derived  to  guide  all  who  are  concerned  with 
the  determination  and  execution  of  pohcies  and  with  the 

D 


34  STATISTICAL  METHODS 

direction  of  operations  —  directors  and  president,  general 
manager,  production  and  sales  managers,  employment 
manager,  and  others  according  to  their  respective  problems. 
More  accurate  forecasting  of  conditions  will  be  possible 
and  more  precise  control  leading  to  desired  results ;  more 
reliable  forecasts  of  demand,  more  favorable  buying,  better 
selection  and  training  of  workers  and  retention  of  workers; 
more  precise  and  dependable  productioh  methods ;  and  a 
better  schedule  of  production  throughout  the  year. 

REVIEW 

1.  Compare  the  definition  of  statistics  given  by  Mr.  Person 
with  that  given  in  the  Text.  What  have  they  in  common?  In 
what  ways,  if  at  all,  are  they  chfTerent? 

2.  As  in  question  1,  compare  the  definitions  of  statistical  methods. 
Wherein  do  the  main  difficulties  He  in  the  use  of  statistical  methods? 
Is  the  emphasis  different  from,  or  the  same  as,  that  developed  in 
the  Text  ?     How  ? 

3.  What  does  the  author  mean  by  homogeneity  of  statistical 
data?     Illustrate  in  other  fields. 

4.  What  does  the  author  mean  by  "  master  facts  behind  the 
mass  of  unit  facts"? 

5.  Enumerate  and  illustrate  the  "Practical  Objects  of  Statistics 
in  Business." 

6.  What  are  the  dangers  of  misuse  of  statistics  in  determining 
"standards  for  measuring  current  performance  and  estimating 
future  performance"?  Illustrate  these  further  from  your  own 
experience.     How  are  these  to  be  overcome  in  statistical  analysis? 

7.  On  what  types  of  statistics  should  a  business  concentrate 
its  attention  so  far  as  collection  is  concerned,  and  for  what  tj-pes 
may  it  look  to  outside  sources?  How  does  your  answer  as  a  general 
proposition  fit  your  own  particular  business  problems?      Illustrate. 

8.  How  does  the  author  define  "cost  statistics"?  Do  you 
agree?     Why? 

9.  Would  you  say  the  author  thinks  of  statistics  as  an  end,  or 
a  means  to  an  end?     Distinguish  the  two  points  of  view. 


THE  MEANING  OF  STATISTICAL  METHODS       35 

10.  Does  the  writer's  treatment  support  the  conclusion  that 
"statistics  are  recorded,  classified,  and  analyzed  experience"? 
Would  you  think  it  necessary  in  any  way  to  expand  or  condition 
this  statement?     How? 

Scientific  Methods  —  The  Method  of  Investigation 
IN  Relation  to  Business  Cycles  ^ 

Beveridge  ascribes  crises  to  industrial  competition, 
May  to  the  disproportion  between  the  increase  in  wages 
and  in  productivity,  Hobson  to  over-saving,  AftaUon  to 
the  diminishing  marginal  utility  of  an  increasing  supply 
of  commodities,  liouniatian  to  over-capitahzation,  Spiet- 
hoff  to  over-production  of  industrial  equipment  and  under- 
production of  complementary  goods,  Hull  to  high  costs 
of  construction,  Lescure  to  dechning  prospects  of  profits, 
Veblen  to  a  discrepancy  between  anticipated  profits  and 
current  capitalization,  Sombart  to  the  unlike  rhythm  of 
production  in  the  organic  and  inorganic  realms.  Carver 
to  the  dissimilar  price  fluctuations  of  producers'  and  con- 
sumers' goods,  Fisher  to  the  slowness  with  which  interest 
rates  are  adjusted  to  changes  in  the  price  level. 

One  seeking  to  miderstand  the  recurrent  ebb  and  flow 
of  economic  activity  characteristic  of  the  present  day  finds 
these  numerous  explanations  both  suggestive  and  perplex- 
ing. All  are  plausible,  but  which  is  vahd?  None  nec- 
essarily excludes  all  the  others,  but  which  is  the  most  im- 
portant? Each  may  account  for  certain  phenomena; 
does  any  one  account  for  all  the  phenomena?  Or  can 
these  rival  explanations  be  combined  in  such  a  fashion  as 
to  make  a  consistent  theory  which  is  wholly  adequate  ? 

There  is  slight  hope  of  getting  answers  to  these  ques- 

'  Adapted  with  permission  from  Mitchell,  Wesley  C,  "The  Method  of 
Investigation,"  in  Business  Cycles,  Chapter  I,  Sec.  Ill,  pp.  19-20. 


36  STATISTICAL   METHODS 

tions  by  a  logical  process  of  proving  and  criticizing  the 
theories.  For  whatever  merits  of  ingenuity  and  con- 
sistency they  may  possess,  these  theories  have  slight  value 
except  as  they  give  keener  insight  into  the  phenomena  of 
business  cycles.  It  is  by  study  of  the  facts  which  they 
purport  to  interpret  that  the  theories  must  be  tested. 

But  the  perspective  of  the  investigation  would  be  dis- 
torted if  we  set  out  to  test  each  theory  in  turn  by  collect- 
ing evidence  to  confirm  or  to  refute  it.  For  the  point  of 
interest  is  not  the  validity  of  any  writer's  views,  but  clear 
comprehension  of  the  facts.  To  observe,  analyze,  and  sys- 
tematize the  phenomena  of  prosperity,  crisis,  and  depres- 
sion is  the  chief  task.  And  there  is  better  prospect  of 
rendering  service  if  we  attack  this  task  directly,  than  if  we 
take  the  roundabout  way  of  considering  the  phenomena 
with  reference  to  the  theories. 

This  plan  of  attacking  the  facts  directly  by  no  means 
precludes  free  use  of  the  results  achieved  by  others.  On 
the  contrary,  their  conclusions  suggest  certain  facts  to  be 
looked  for,  certain  analyses  to  be  made,  certain  arrange- 
ments to  be  tried.  Indeed,  the  whole  investigation  would 
be  crude  and  superficial  if  we  did  not  seek  help  from  all 
quarters.  But  the  help  wanted  is  help  in  making  a  fresh 
examination  into  the  facts. 

It  is  not  feasible  to  make  a  study  of  all  crises.  .  .  .  Not 
only  is  the  field  too  extensive  to  cover  thoroughh^  but  the  re- 
corded information  is  also  too  vague,  too  much  confined 
to  the  dramatic  events  of  the  crises,  and  too  scanty  con- 
cerning the  intervening  phases  of  depression  and  prosperity. 
Whatever  chance  there  may  be  of  bettering  the  work  al- 
ready done  Ues  in  securing  data  more  full  and  more  pre- 
cise than  the  data  heretofore  employed.  The  minute  ex- 
amination   of    a    few    business    cycles    therefore    promises 


THE   MEANING  OF  STATISTICAL  METHODS       37 

better  results  than  a  general  survey  of  many.  Hence  at- 
tention will  be  concentrated  upon  those  cycles  concerning 
which  the  fullest  and  most  exact  knowledge  is  available 
—  the  cycles  of  the  last  two  decades.  By  including  Eng- 
land, Germany,  and  France,  as  well  as  the  United  States, 
a  sufficient  number  of  cases  can  be  had  to  warrant  gen- 
eraUzations. 

The  materials  most  important  for  such  an  investigation 
are  the  current  reports  of  business  periodicals  and  the  sta- 
tistical records  of  business  activities.  Most  stress  must 
be  laid  upon  the  latter;  for  the  problems  to  be  dealt  with 
are  largely  problems  of  the  relative  importance  of  different 
faptors,  or  of  the  general  trend  of  diverse  fluctuations. 
Quantitative  analysis  of  the  phenomena  is  needed  quite 
as  much  as  quaUtative  analysis.  Since  in  his  efforts  to 
make  accurate  measurements  the  economic  investigator 
cannot  devise  experiments,  he  must  do  the  best  he  can 
with  the  cruder  gauges  afforded  by  statistics. 

REVIEW 

1.  Business  cycles  occur;  the  explanations  given  for  them  do 
not  agree.  What  is  Mitchell's  approach  to  a  study  of  them  ?  Does 
his  method  appear  to  you  to  be  scientific?     Why? 

2.  What  are  some  of  the  "  facts  "  to  which  Mitchell  refers  ?  Con- 
sult his  Business  Cycles. 

3.  Is  there  any  great  likelihood  that  there  will  be  an  agreement 
on  all  of  the  facts?  Can  the  results  of  the  facts  be  statistically 
measured?  How  about  speculative  instincts  —  "the  willingness 
to  take  a  chance"? 


38  STATISTICAL  METHODS 

The  Statistical  Method  of  Discovering  and 
Widening  Markets  ^ 

To  take  the  place  of  the  old  rule  of  thumb,  catch-as- 
catch-can  method  of  selling,  which  is  gradually  passing 
into  the  discard,  there  is  appearing  a  real  desire  on  the  part 
of  industrial  leaders  to  make  scientific  analysis  of  their 
selling  efforts.  Imbued  with  this  desire,  the  manufacturer 
(or  jobber  or  retailer)  finds  that  it  is  no  such  simple  task 
to  acquire  the  knowledge  of  his  own  business  —  that  he 
formerly  thought  was  unnecessary  —  but  that  he  now 
beheves  he  wants. 

When  the  boss  learns  of  the  experience  of  other  con- 
cerns in  the  development  of  scientific  commercial  methods, 
he  begins  to  cast  around  in  his  own  organization,  trying 
to  get  information.  He  finds  that  his  own  manager,  per- 
haps, is  too  busy  to  think  of  the  questions  he  propounds, 
or  that  he  has  Uved  in  the  business  so  long  that  he  can't 
see  beyond  the  walls  of  the  factory  or  of  the  office.  The 
sales  manager  believes  that  everything  is  going  as  well  as 
could  be  expected,  and  the  boss  finds  that  he  has  little 
sympathy  with  his  "new  f angled"  ideas.  He  finds  that 
the  various  department  managers  are  too  engrossed  in 
the  details  of  their  own  narrow  fields  to  be  of  much  assist- 
ance. 

Sometimes  the  owner  finds  a  man  in  his  own  organization 
who  gets  the  right  "slant"  and  who  has  the  initiative  and 
the  breadth  of  vision  to  organize  for  collecting  the  infor- 
mation wanted.  Sometimes  the  advertising  manager  is 
the  man  who  fills  the  need.  But,  more  commonly,  if  the 
owner  is  persistent  enough,  he  looks  for  an  infusion  of  new 

*  Adapted  with  permission  from  Weld,  L.  D.  H.,  "A  Strong  Foundation 
for  Your  Advertising,"  in  Printers'  Ink,  January  9,  1919,  pp.  3-12. 


THE   MEANING   OF  STATISTICAL  METHODS        39 

blood  —  perhaps  in  the  form  of  a  new  sales  manager  who 
has  had  experience  in  other  fields.  Sometimes,  however, 
he  decides  to  establish  a  new  department,  just  as  the  mana- 
ger of  a  manufacturing  plant,  when  he  introduces  scientific 
management,  finds  it  necessary  to  organize  a  separate  de- 
partment to  make  time  studies  and  to  do  the  planning. 

Thus  it  has  come  about  that  in  a  few  cases  there  have  been 
established  commercial  research  departments,  whose  duty 
it  is  to  collect,  tabulate,  and  interpret  information  about 
selliilg  methods  and  results,  and  to  plan  methods  for  in- 
creasing the  effectiveness  of  the  sales  organization.  Some- 
times this  work  is  done  fairly  effectively  by  advertising 
agencies;  sometimes  outside  organizations,  or  "sales  en- 
gineers" are  called  in ;  but  there  is  a  growing  feeling  among 
large  manufacturing  and  mercantile  concerns  that  in  order 
to  get  complete  and  substantial  service,  it  is  necessary  for 
them  to  have  investigating  and  planning  departments 
of  their  own,  and  that  there  is  a  permanent  place  for  such 
departments. 

The  larger  the  concern  the  greater  the  need  for  such  a 
department.  But  what  is  the  kind  of  information  wanted? 
What  are  the  features  of  sales  organization  and  methods 
that  are  beginning  to  demand  attention?  The  answers 
to  these  questions  indicate  in  general  the  function  of  a 
commercial  research  department. 

The  science  of  commercial  research  has  not  developed 
sufficiently  as  yet,  to  give  a  very  specific  answer  to  these 
questions.  The  functions  of  such  a  department  depend 
largely,  of  course,  on  the  nature  of  the  business,  and  the 
seUing  methods  in  use.  In  the  case  of  a  large  business 
with  different  departments  selling  a  variety  of  articles, 
the  functions  of  the  research  department  are  more  numerous 
than  in  the  case  of  a  smaller  concern  selling  a  single  product. 


40  STATISTICAL  METHODS 

The  manufacturer  of  advertised  antl  branded  articles  usu- 
ally has  more  need  of  a  research  department  than  the  seller 
of  unbranded  articles. 

Broad  Field,  but  Cultivation  Should  Be  Intensive 

The  fundamental  question  which  a  commercial  research 
department  faces  is  this  :  How  can  we  extend  the  market 
for  our  goods  ?  But,  in  order  to  answer  this  question  other 
questions  have  to  be  asked. 

Are  we  getting  the  best  results  from  our  present  selling 
activities  ? 

What  are  our  selling  costs  ? 

Is  our  distribution  even  throughout  the  country? 

What  share  of  the  business  are  we  getting? 

Are  the  salesmen  properly  trained  ? 

Are  they  paid  in  the  best  mannei-? 

How  often  do  they  report,  and  what  do  they  report  ? 

How  thoroughly  are  salesmen's  reports  analyzed  ? 

How  well  do  salesmen  cover  their  territories,  and  are 
these  territories  laid  out  scientifically? 

Could  business  in  certain  sections  be  developed  by  es- 
tablishing branch  houses  carrying  stocks  of  goods? 

Then  there  are  other  questions  concerning  sales  policies 
and  price  policies.     Are  prices  maintained  by  dealers? 

Are  exclusive  dealers  used  ? 

Are  quantity  prices  allowed,  and,  if  so,  are  they  ad- 
justed properly? 

How  do  dealers  feel  toward  our  products  ? 

Are  dealers  sold  in  proper  quantities  ? 

How  many  different  competing  brands  do  dealers  handle  ? 

To  what  extent  do  consumers  ask  for  our  product  by  its 
brand  name? 


THE   MEANING   OF   STATISTICAL   METHODS       41 

And  then  there  are  numerous  questions  to  be  asked  concern- 
ing the  advertising. 

These  are  only  a  few  of  the  questions  that  a  commercial 
research  department  might  be  called  on  to  answer.  It  is 
not  necessary,  however,  for  such  a  department  to  start 
out  by  trying  to  solve  all  the  problems  suggested  above. 
Rather  may  it  prove  more  useful  by  addressing  itself  to 
some  specific  problem. 

Perhaps  the  most  important  s(n'\'ice  that  a  commercial 
research  department  can  perform  is  the  collection  of  infor- 
mation that  can  be  obtained  only  by  field  analyses  or 
market  surveys  —  that  is,  information  that  does  not  exist 
within  the  organization  in  any  form,  but  that  has  to  be 
gathered  from  the  outside.  The  only  members  of  the  or- 
ganization who  could  possibly  have  this  information,  or 
who  are  coming  in  contact  with  the  people  from  whom 
it  could  be  obtained,  are  the  salesmen. 

But  salesmen  can't  successfully  make  the  market  surveys 
necessary  in  scientific  selling  for  the  following  reasons : 
(1)  If  a  salesman  is  properly  routed  over  his  territory,  he 
cannot  possi])ly  have  the  time  to  collect  the  information 
needed ;  (2)  The  salesman  has  a  personal  interest,  which 
blinds  him,  either  consciously  or  unconsciously,  to  facts 
that  would  place  his  work  in  an  unfavorable  light ;  and 
(3)  many  salesmen  are  lacking  in  a  broad  conception  of 
fundamental  merchandising  problems,  and  hence  they  fre- 
quently fail  to  grasp  the  significance  of  facts  which  would 
be  of  value  to  the  management. 

For  these  reasons,  market  surveys  need  to  be  made  by 
men  who  are  detached  from  the  regular  selling  force. 
Furthermore,  they  ought  to  have  a  training  in  the  funda- 
mentals of  business  organization.  They  ought  to  be  able 
to  answer  inteUigently :    Why  does  my  firm  sell  through 


42  STATISTICAL  METHODS 

jobbers,  rather  than  direct  to  retailers?  What  would 
be  the  advantages  of  selling  direct?  How  much  more 
would  it  cost?  How  much,  approximately,  does  it  cost 
to  sell  the  different  commodities  my  concern  is  marketing, 
and  what  is  the  relative  profitableness  of  the  different 
lines  ? 

Questions  Swift  and  Company  Are  Solving 

A  good  example  of  the  difficulties  surrounding  this  last 
question  is  a  problem  faced  by  Swift  and  Company.  This 
company  sells  a  variety  of  products  through  its  400  branch 
houses.  Branch-house  selling  costs  are  measured  as  so 
many  cents  per  hundred  pounds  —  lumping  together 
"Premium"  hams,  oxtails,  soap  powder,  eggs,  oleomar- 
garine, etc.  Just  how  much  it  costs  to  sell  soap  powder 
as  compared  with  "Premium"  hams,  can  never  be  de- 
termined exactly,  but  approximations  can  be  made  by  con- 
sidering amount  of  salesman's  time  necessary  to  sell,  rate 
of  turnover,  amount  of  storage  space  required,  etc. 

This  suggests  another  of  Swift  and  Company's  selling 
problems.  Goods  are  distributed  partly  through  branch 
houses  and  partly  by  means  of  "car  routes."  Car  route 
distribution  means  the  suppljdng  of  retailers  in  small  towns 
direct  by  drop  shipment  from  refrigerator  cars  that  are  sent 
out  from  the  packing  plants  at  regular  intervals,  each  car 
serving  the  dealers  in  a  dozen  or  more  towns  along  a  line  of 
railroad. 

This  question  frequently  arises :  Shall  a  certain  town 
be  served  by  a  car  route  or  shall  it  be  served  by  a  near-by 
branch  house,  or  is  the  to"^Ti  large  enough  to  have  a  branch 
house  of  its  own?  Only  when  one  gets  beneath  the  sur- 
face, can  he  begin  to  reaUze  the  complexities  of  this  prob- 


THE   MEANING   OF  STATISTICAL  METHODS        43 

lem,  especially  when  a  perfectly  commendable  business 
rivalry  and  jealousy  between  the  two  departments  in- 
volved has  precluded  the  development  of  a  scientific  method 
of  answering  this  question  when  it  arises.  This  is  only 
one  of  many  instances  that  suggest  the  possibilities  of  a 
commercial  research  department  for  such  a  large  concern 
as  Swift  and  Company. 

Even  if  market  analyses  and  surveys  are  the  main  ob- 
ject of  a  commercial  research  department  there  are  cer- 
tain statistical  analyses  of  existing  facts  and  figures  which 
should  be  made  first.  There  are  very  few  concerns  that 
have  analyzed  to  their  fullest  possible  usefulness,  the  fig- 
ures that  they  already  have  in  their  own  records.  Many 
firms  have,  within  the  past  few  years,  forced  their  salesmen 
to  go  to  the  trouble  of  making  daily  instead  of  weekly  re- 
ports, and  then  have  not  themselves  taken  the  trouble  to 
make  proper  use  of  the  information  furnished  by  such  re- 
ports. .  .  . 

Analysis  of  either  existing  facts,  or  of  facts  that  have 
to  be  obtained  by  means  of  field  surveys,  calls  for  a  knowl- 
edge of  statistical  methods.  The  construction  of  aver- 
ages, of  per  capita  sales  by  states,  etc.,  offers  many  pit- 
falls to  the  uninitiated.  Improper  statistical  analysis 
may  do  more  harm  than  good,  .  ,  . 

Use  of  Graphic  Charts 

One  of  the  most  valuable  things  that  a  research  depart- 
ment can  do  in  a  statistical  way  is  to  present  its  analyses 
in  the  form  of  graphic  charts.  Curves  representing  sales 
by  weeks  or  months  are  invaluable.  The  writer  believes 
that  the  common  practice  of  comparing  "last  week's  sales" 
with  the  sales  of  the  "corresponding  week  previous  year," 


44  STATISTICAL  METHODS 

is  hardly  sufficient  to  give  an  accurate  picture  of  sales  de- 
velopment. The  sales  of  the  different  products  should  also 
be  graphed.  The  seasonal  variations  should  be  studied 
—  and  for  different  sections  of  the  country.  Then  these 
things  should  be  compared  with  the  methods  of  routing 
salesmen,  the  possible  effect  of  changes  in  advertising  pol- 
icy, etc.  "Graphic  control"  of  industry  is  becoming  rec- 
ognized more  and  more.  It  saves  the  time  of  executives, 
and  it  gives  them  a  broader  view  of  their  business  prob- 
lems. .  .  . 

Market  surveys  may  cover  either  dealers  or  consumers, 
or  both.  Consumer  surveys  are  necessarily  the  more 
costly,  in  that  they  require  more  time  and  a  larger  corps 
of  investigators.  Much  information  about  consumers  may, 
of  course,  be  obtained  from  dealers. 

It  is,  of  course,  not  necessary  to  visit  all  retailers  or  all 
consumers  in  the  country!  The  method  of  "samphng" 
may  be  used.  Typical  communities  in  different  parts  of 
the  country  should  be  carefully  selected.  After  the  returns 
have  begun  to  come  in  and  are  tabulated,  it  is  possible 
for  the  analyst  to  determine  how  comprehensive  the  sur- 
vey must  be  in  order  to  make  it  yield  accurate  and  de- 
pendable results.  When  the  returns  from  different  com- 
munities begin  to  check  with  each  other  and  show  the 
same  tendencies  or  explainable  differences,  this  is  an  indi- 
cation that  dependable  results  are  being  obtained.  When 
they  show  irreconcilable  and  unexplainable  differences, 
this  is  an  indication  that  the  survey  is  not  comprehensive 
enough  to  bring  forth  trustworthy  fundamentals. 

In  planning  a  survey,  a  Ust  of  questions  should  be  drawn 
up  as  carefully  as  possible,  worded  in  such  a  way  as  to  be 
answerable  in  the  easiest  possible  way.  Whenever  possible, 
questions  should  be  asked  in  such  a  way  as  to  be  an- 


THE   MEANING   OF  STATISTICAL  METHODS        45 

swered  by  "Yes"  or  "No"  or  by  some  figure.  A  list  of 
questions  should  be  tried  out  before  the  final  form 
is  adopted.  The  man  in  charge  of  the  investigation  should 
do  some  of  the  field  work  himself,  in  order  to  be  able  better 
to  interpret  the  results,  and  to  understand  the  difficulties 
of  the  investigators.  The  question  should  be  printed  on 
forms  of  convenient  size  and  shape  and  on  good  enough 
paper  to  be  easily  handled  and  read.  .  .  . 

From  dealers,  the  manufacturer  wants  to  know  how 
many  fines  of  competing  goods  are  carried;  what  per- 
centage of  the  business  he  is  getting;  whether  consumers 
ask  for  the  article  by  its  name ;  whether  dealers  push  cer- 
tain brands,  and  why;  why  goods  are  returned;  whether 
store  signs  and  dealer  helps  are  used,  etc.,  etc. 

From  the  consumer  the  manufacturer  wants  to  know 
why  she  buys,  or  why  she  doesn't  buy,  his  product; 
whether  retailers  try  to  get  her  to  buy  a  substitute; 
whether  she  fikes  the  color  and  the  appearance ;  how  often 
she  buys,  or  why  she  doesn't  buj^,  his  brand,  etc.,  etc. 

This  is  the  kind  of  information  that  can  be  obtained 
in  the  best  possible  way  only  by  a  commercial  research 
department.  There  are  also  many  problems  in  connection 
with  advertising  methods  and  copy  that  can  be  solved 
only  by  personal  contact  with  dealers  and  consumers ;  and 
it  should  be  the  duty  of  this  department  to  help  in  the 
analysis  of  advertising  results,  and  to  check  up  the  agency 
on  the  choice  of  mediums,  etc. 

From  the  foregoing  analysis  it  would  seem  that  there 
are  enough  things  for  a  commercial  research  department 
to  do,  and  there  is  probably  not  a  single  one  in  existence 
that  has  tackled  half  the  things  enumerated.  The  usual 
experience  has  been,  so  far  as  the  writer  knows,  that  such 
a  department  has  found  itself  so   busy  with  just   a  few 


46  STATISTICAL   METHODS 

specific  problems,  that  it  has  proved  its  usefulness  even 
within  restricted  fields,  and  has  unbounded  possibiUties 
ahead. 

In  conclusion  let  it  be  said  that  in  many  industries  there 
are  still  other  problems  than  those  mentioned  above,  to 
which  a  research  department  may  well  address  itself.  And 
these  are  some  of  the  most  vital  problems  of  the  day.  These 
have  to  do  with  the  broad  and  fundamental  relations  of  an 
industry  with  the  public  and  with  the  Government.  The 
economics  of  any  industry  are  well  worth  studying.  Just 
what  economic  function  does  any  particular  industry  per- 
form? How  is  it  a  benefit  to  mankind?  To  what  extent 
is  it  misunderstood  by  the  public?  How  can  its  service 
be  improved?  What  is  its  poUcy  in  dealing  with  the  pub- 
lic and  with  its  own  working  people  ?  ,  .  . 

REVIEW 

1.  Has  the  author  a  scientific  viewpoint  respecting  advertising 
and  market  extension?     If  so,  why?     If  not,  why  not? 

2.  What  is  meant  by  commercial  research?  In  studying  mar- 
kets, what  kinds  of  questions  must  a  commercial  research  depart- 
ment ask  in  order  to  be  "scientific"?  Where  and  through  whom 
are  answers  to  such  questions  secured  ? 

3.  In  what  way  is  Swift  and  Company  in  need  of  a  commercial 
research  department?  Would  your  answer  apply  equally  well  to 
all  types  of  businesses?  Can  you  defend  the  establishment  of  a 
research  departrdent  in  a  country  bank,  in  a  grocery  business? 
Is  the  size  of  the  department  alone  significant  in  the  application 
of  "scientific  method"? 

4.  Can  "guesses,"  to  which  exception  is  taken,  be  scientific? 

5.  To  whom  should  market  surveys  extend?  Compare  the  dis- 
cussion of  "sampling"  as  here  treated  with  the  discussion  in  the 
Text. 


CHAPTER  II 
SOURCES  AND  COLLECTION  OF  STATISTICAL  DATA 

Statistics  of  Unemployment  ^ 

Statistical  information  as  to  unemployment  in  the 
United  States  is  less  adequate  and  reliable  than  that  as  to 
almost  any  other  social  problem.  The  federal  govern- 
ment, several  of  the  states,  and  various  other  agencies 
have  made  censuses  of  the  unemployed  from  time  to  time, 
but  in  the  greater  number  of  cases  the  data  thus  secured 
are  of  little  value.  .  .  . 

The  sources  of  statistical  information  as  to  unemploy- 
ment among  trade  unionists  are  the  publications  of  the 
state  departments  of  labor  and  of  the  trade  unions.  .  .  . 

The  New  York  Department  of  Labor  has  collected  since 
March,  1897,  statistics  of  unemployment  among  the  trade 
unionists  of  that  state.  From  1897  to  1914  it  collected 
semi-annually,  from  all  the  trade  unions,  information  as 
to  the  number  of  members  employed  and  unemployed 
on  the  last  working  days  of  March  and  September,  the 
causes  of  such  unemployment,  the  number  of  members  idle 
throughout  the  first  and  third  quarters  of  the  year,  and  the 
number  of  days  which  each  member  worked  during  these 
periods.  The  supply  of  this  information  was  made  com- 
pulsory by  law.  Since  December,  1901,  the  New  York 
Department  has  selected  certain  local  unions  in  each  trade 

*  Adapted  with  permission  from  Smelser,  D.  P.,  "  Unemployment  and 
American  Trade  Unions," Johns  HopkinsUniversity  Studies,  Series  XXXVII, 
No.  1,  1919,  pp.  9-32. 

47 


48  STATISTICAL   METHODS 

and  industry  from  which  it  has  secured  monthly  returns 
as  to  unemployment.  It  has  attempted  to  select  local 
unions  wliich  have  reliable  and  intelhgent  secretaries,  to 
have  each  trade  represented  in  proportion  to  the  number 
of  workmen  engaged  in  each  class,  and  to  maintain  the 
same  proportionate  representation  from  month  to  month 
so  that  the  data  may  be  comparable. 

Both  classes  of  statistics  are  of  doubtful  value.  The 
secretaries  of  the  local  unions  in  many  cases  had  no  means 
by  which  they  could  determine  the  actual  number  employed 
and  unemployed,  and  consequently  they  resorted  to  rough 
estimates.  Further,  there  was  a  tendency  to  exaggerate 
the  amount  of  unemployment  in  the  hope  that  this  would 
favorably  affect  public  opinion.  These  defects  were  es- 
pecially inherent  in  the  data  collected  semi-annually  from 
all  unions,  and  for  this  reason  the  collection  of  this  class  of 
data  was  discontinued  in  1914.  The  data  relating  to  se- 
lected unions  are  defective  in  many  respects,  but  it 
is  thought  that,  while  they  are  of  no  great  value  as  regards 
the  actual  amount  of  unemployment,  they  are  of  con- 
siderable importance  in  making  apparent  the  movements 
in  the  state  of  employment  from  month  to  month  and  from 
year  to  year.  .  .  . 

The  Massachusetts  Bureau  of  Statistics,  since  March, 
1908,  has  collected  data  as  to  unemployment  from  trade 
unions  situated  in  that  state.  This  information  is  com- 
parable, in  many  respects,  to  that  collected  by  the  New 
York  Department.  In  Massachusetts  information  as  to 
unemployment  is  secured  only  from  those  unions  which 
desire  to  report  their  working  conditions.  However,  the 
majority  of  the  trade-union  membership  is  represented 
in  the  returns.  Thus,  for  the  quarter  ending  September 
30,  1915,  returns  were  made  by  1052  local  unions  repre- 


COLLECTION   OP  STATLSTICAL   DATA  49 

senting  175,754  organized  wage  earners,  or  approximately 
75  per  cent  of  the  trade-union  membership  of  the  State. 
Monthly  returns  are  not  made  by  any  of  the  unions,  re- 
ports being  made  only  for  the  last  working  daj^s  of  the  four 
quarters  of  the  year  by  the  secretaries  of  the  local  unions. 
The  returns  are  scrutinized  by  the  bureau's  experts  and  if 
any  errors  are  apparent  the  schedules  are  returned  for  cor- 
rection. .  .  . 

The  New  Hampshire  Bureau  of  Labor  is  the  only  other 
state  bureau  which  has  collected  statistics  of  unemploy- 
ment among  organized  wage  earners,  and  these  statistics 
are  practically  valueless  as  they  give  only  the  percentages 
of  members  unemployed  throughout  the  first  and  second 
quarters  of  1915.  It  seems  that  the  secretaries  of  the 
local  unions,  in  most  cases,  were  unable  to  accurately  re- 
port such  information. 

A  number  of  the  American  trade  unions  have  attempted 
to  collect  statistics  of  unemployment  of  their  members. 
Generally  these  attempts  have  failed,  either  because  the 
secretaries  of  the  local  unions  refused  to  report  conditions 
accurately,  or  because  the  secretary  of  the  national  union 
failed  to  recognize  the  importance  of  the  statistical  infor- 
mation as  to  unemployment.  The  unions  have  the  op- 
portunity of  collecting  such  material  at  small  expense. 
In  all  unions  the  secretaries  of  the  subordinate  branches 
make  monthly  reports  to  headquarters  concerning  various 
subjects,  and  where  statistical  information  as  to  unem- 
ployment has  been  collected  these  monthly  reports  have 
generally  been  utilized  for  this  purpose. 

The  American  Federation  of  Labor  collected  from  1899 
to  1908  data  relating  to  unemployment  among  members 
of  its  affiUated  unions.  The  number  of  workmen  repre- 
sented in  the  returns  varied  as  much  as  800  per  cent  from 


50  STATISTICAL  METHODS 

one  month  to  another  in  the  same  year,  and  as  the  reports 
were  made  by  the  secretaries  of  the  national  unions  it  is 
obvious  that  the  data  secured  were  not  accurate.  For  this 
reason  the  collection  of  this  information  was  discontinued 
in  1909. 

The  Wisconsin  State  Federation  of  Labor  has  collected 
statistics  of  unemployment  from  its  affiliated  unions  since 
1912.  The  information  collected  in  1912  was  worthless 
and  that  for  the  two  succeeding  years  was  far  from  satis- 
factory. In  1913  the  affihated  unions  were  requested  to 
report  the  percentages  of  members  unemployed  on  Sep- 
tember 1.  Returns  were  made  by  243  local  unions  with  a 
total  membership  of  19,921.  Of  these,  1436  members,  or  7.2 
per  cent,  were  reported  as  idle.  This  percentage  is  but  four 
tenths  of  one  per  cent  higher  than  that  of  Massachusetts 
for  September  30  of  the  same  year,  while  it  is  12.8  lower 
than  the  New  York  percentage  for  August  31. 

A  few  unions  have  reahzed  the  benefits  accruing  from  the 
collection  of  statistical  information  as  to  unemployment 
and  have  accordingly  provided  in  their  constitutions  that 
the  local  union  secretaries  shall  report  the  state  of  employ- 
ment at  specified  periods.  For  example,  the  Potters, 
Plumbers,  Boilermakers,  Iron  Holders,  Lithographers, 
Elevator  Constructors,  and  Metal  PoUshers  require  the 
secretaries  of  their  subordinate  unions  to  report  either 
monthly  or  quarterly  the  number  of  members  employed 
and  unemployed.  But  little  attention  is  paid  by  the  secre- 
taries to  these  provisions,  and  in  the  unions  where  the  in- 
formation is  reported  it  is  neither  used  by  the  general  secre- 
taries nor  compiled  for  publication. 

The  Painters,  Paperhangers,  and  Decorators,  at  their 
convention  in  1913,  provided  that  an  official  "time  book" 
should  be  issued  to  each  member  of  the  union,  who  was  to 


COLLECTION  OF  STATISTICAL  DATA  51 

record  in  it  all  time  lost  through  unemployment  and  the 
causes  of  such  idleness,  and  report  quarterly  to  his  local 
union.  The  secretaries  of  the  subordinate  branches  were 
instructed  to  compile  these  reports  and  send  them  to  the 
national  union.  It  was  thought  that  much  valuable  in- 
formation could  thus  be  secured.  Considerable  light  would 
have  been  thrown  upon  the  question  of  variation  in  unem- 
ployment among  locaHties.  However,  it  was  found  impos- 
sible to  secure  the  desired  information  from  the  members 
except  through  a  system  of  fines,  which,  of  course,  would 
have  had  a  tendency  to  produce  inaccurate  statistics.  Con- 
sequently, these  time  books  are  used  in  only  a  few  unions. 
It  is  understood  that  the  Chicago  local  union  has  collected 
statistics  of  unemployment  from  its  members  for  five  or 
six  years.  It  was  reported  at  the  convention  in  1913  that 
the  data  collected  in  the  two  previous  years  indicated  that 
the  average  painter  lost  ninety-eight  working  days  each 
year  through  inability  to  secure  work. 

The  Glass  Bottle  Blowers  have  collected  and  privately 
pubhshed  statistical  information  as  to  unemployment  atnong 
its  members  for  several  years.  But  in  consequence  of  the 
fact  that  no  distinction  is  made  between  the  members  to- 
tally unemployed  and  those  working  as  ''spare  men"  this  in- 
formation is  of  little  value.  There  is  also  available  in  the 
monthly  journals  of  the  Wood  Carvers  data  as  to  the  num- 
ber of  members  employed  and  unemployed  on  the  last  work- 
ing day  of  the  month.  Percentages  of  unemployment  have 
been  calculated  for  the  period  1909-1915,  and  there  is  Httle 
fluctuation  in  them  from  month  to  month  and  from  year 
to  year,  the  rate  of  unemployment  ranging  between  twenty 
and  twenty-five  per  cent.  This  would  seem  to  indicate 
that  the  returns  are  not  accurate  but  mere  estimates  of  the 
secretaries.  .  .  . 


52  STATISTICAL   METHODS 

In  view  of  the  fact  that  so  Uttle  attention  has  been 
given  to  the  collection  of  data  as  to  unemployment  in  the 
United  States  before  1900,  it  is  rather  surprising  to  find 
that  the  Bricklayers'  Union,  organized  in  1865,  collected 
semi-annually  statistics  of  unemployment  from  1882  to  1911 
and  monthly  thereafter.  These  statistics  are  based  upon 
the  reports  by  the  local  secretaries  of  the  number  of  mem- 
bers employed  and  unemployed.  Not  all  of  the  unions 
reported,  as  some  were  always  in  a  state  of  disorganization 
or  were  involved  in  labor  disputes ;  but  the  reports  are 
fairly  representative  of  the  entire  membership,  and  the 
average  percentage  of  the  membership  included  in  the 
data  for  the  period  1882-1911  is  79.1.  There  is  no  reason 
to  believe  that  those  unions  which  are  not  represented  in 
the  returns,  except  the  few  on  strike,  had  more  or  less  un- 
employment than  the  average  of  those  reporting.  The  re- 
turns unfortunately  include  members  who  were  reported 
as  unemployed  on  account  of  labor  disputes  and  illness. 
Of  course  the  inclusion  of  these  members  has  produced  high 
percentages  of  unemployment. 

Another  important  question  is  whether  the  secretaries 
correctly  reported  the  number  of  the  unemployed.  Secre- 
taries of  unions  having  less  than  fifty  members  could  easily 
determine  the  number  of  unemployed,  since  they  generally 
knew  the  places  where  members  were  at  work ;  but  in 
unions  with  a  larger  membership  —  many  of  the  local  unions 
have  from  100  to  7000  members  —  the  secretaries  were 
unable  to  make  exact  returns  from  their  own  knowledge. 
In  such  cases  the  secretaries  either  based  their  returns  upon 
rough  estimates  or  upon  the  reports  of  the  stewards.  It 
is  impossible  to  determine  the  extent  to  which  the  stewards' 
reports  were  used.  It  would  not  have  been  difficult  to  as- 
certain the  exact  number  of  members  employed  on  a  given 


COLLECTION   OF  STATISTICAL   DATA  53 

day  if  these  reports  had  been  used,  because  each  week  the 
stewards  on  the  various  jobs  reported  the  names  of  all 
members  working  on  particular  daj^s.  The  reports  are 
supposed  to  give  the  number  of  members  employed  and 
unemployed  on  the  last  working  days  of  June  and  De- 
cember; but  it  is  understood  that  frequentl}^  the  returns 
were  based  upon  the  condition  of  trade  sUghtly  before  and 
after  these  dates.  .  .  . 

The  Flint  Glass  Workers  have  collected  quarterly  statistics 
of  unemployment  since  1907,  but  the  data  are  fragmentary 
from  1907  to  1912.  In  1913  the  union  also  included  in 
its  inquiry  questions  as  to  the  number  of  members  who 
were  unemployed  at  the  trade,  but  who  had  secured  tem- 
porary employment  in  other  lines  of  industry.  Accord- 
ingly, the  local  unions  were  requested  to  report  the 
number  of  members  employed  at  the  trade,  the  number 
holding  honorary  membership,  disal)led,  and  working  out- 
side the  trade,  and  the  number  of  those  who  were  willing 
and  able  to  work  but  had  not  found  employment  of  any 
kind. 

The  fact  that  many  workmen  secure  subsidiary  em- 
ployment when  they  are  unable  to  secure  employment  at 
their  principal  occupations  is  a  factor  that  has  frequentlj^ 
been  overlooked  in  discussions  of  unemployment  statis- 
tics. The  fact  that  the  unions  in  a  particular  trade  re- 
port that  30  per  cent  of  their  members  were  unemployed 
on  a  certain  day  should  not  be  construed  to  indicate  that 
30  per  cent  of  their  members  were  not  working,  but  that 
30  per  cent  were  not  engaged  at  their  principal  occupa- 
tion. This  defect  in  trade-union  statistics  of  unemploy- 
ment is  due  to  the  fact  that  the  secretary  of  a  local  union 
estimates  the  percentages  of  unemployment  with  the  idea 
that  the  information  which  is  most  desirable  is  that  relat- 


54  STATISTICAL  METHODS 

ing  to  the  number  of  members  who  are  unable  to  secure  em- 
ployment under  the  jurisdiction  of  the  union. 

Statistical  information  as  to  unemployment  among  the 
members  of  the  Pattern  Makers'  Union  is  available  for  each 
month  since  April,  1907.  These  data  have  been  secured 
from  the  reports  of  the  local  union  secretaries  to  the  na- 
tional president  who  compiles  the  statistics  for  private 
use  and  for  publication.  The  secretaries  are  instructed 
to  "give  the  exact  number  of  members  unemployed  at  the 
end  of  the  month"  and  the  membership  of  the  local  unions. 
These  statistics  are,  of  course,  open  to  the  same  criticism 
as  those  of  the  New  York  Department  of  Labor  and  Massa- 
chusetts Bureau  of  Labor,  but  they  are  greatly  superior 
to  the  statistics  collected  by  trade  unions  that  have  here- 
tofore been  considered.  In  January,  1915,  forty  of  the 
sixty-five  local  unions  of  the  Pattern  Makers  had  less  than 
fifty  members  each.  As  was  stated  above,  the  secretaries 
of  local  unions  with  few  members  are  able  to  determine  the 
number  of  unemployed  from  personal  knowledge.  More- 
over, several  of  the  larger  unions,  two  of  which  comprise 
over  20  per  cent  of  the  entire  membership,  pay  out-of- 
work  benefits,  and  all  of  the  local  unions  furnish  out-of-work 
stamps  free  to  the  unemployed,  so  that  their  secretaries,  un- 
like those  of  most  unions,  have  the  opportunity  of  ascer- 
taining the  exact  number  of  unemployed  members  with 
but  little  difficulty.  The  president  of  the  union,  too,  takes 
great  interest  in  the  returns  and  where  a  local  union 
attempts  to  conceal  a  good  condition  of  trade  by  the  re- 
turn of  an  exaggerated  number  of  unemployed,  does  not 
hesitate  to  correct  the  error.  However,  President  Wil- 
son states  that,  although  the  greater  number  of  unions 
make  fairly  accurate  returns,  some  associations  overesti- 
mate the  number  of  unemployed  for  the  purpose  of  deter- 


COLLECTION   OF  STATISTICAL  DATA  55 

ring  the  traveling  members  from  transferring  to  them.  Thus, 
in  January,  1915,  he  pointed  out  that  "one  association 
this  month  reports  that  20  per  cent  of  its  members  are  out 
of  work  while  the  truth  is  that  all  of  its  members  are  em- 
ployed, and  another  union  reports  just  about  three  times 
as  many  as  are  really  idle."  As  with  the  other  data  as  to 
unemployment  in  trade  unions,  these  figures  include  those 
unemployed  from  all  causes.  .  .  . 

One  of  the  most  important  conclusions  to  be  drawn 
from  the  statistics  of  unemployment  relates  to  the  very 
great  differences  in  the  amount  of  unemployment  among 
localities.  The  dominant  industries  of  any  two  States 
are  rarely  the  same,  or  even  if  the  same,  the  proportions 
of  workmen  employed  in  the  various  industries  are  gen- 
erally different.  It  is  certainly  true,  for  example,  that 
the  chief  occupations  of  the  workmen  included  in  the  Massa- 
chusetts returns  are  not  identical  with  those  of  the  work- 
men represented  in  the  New  York  data.  Even  where  the 
industries  are  the  same  in  two  States  certain  local  pecu- 
liarities may  affect  the  seasonal  fluctuations  and  produce 
more  unemployment  in  one  State  than  in  another.  .  .  . 

Not  only  are  the  fluctuations  in  employment  in  the  in- 
dustries of  two  States  taken  as  a  whole  often  quite  different, 
but  it  frequently  happens  that  the  seasonal  fluctuations 
in  the  same  industry  are  difl"erent  in  two  States.  This 
arises  chiefly  out  of  climatic  conditions  although  various 
local  peculiarities  play  a  large  part.  Thus,  when  the  state 
of  employment  in  the  building  trades  of  New  York  City 
is  poor,  Philadelphia  may  be  erecting  a  number  of  large 
buildings  and  may  need  additional  workmen.  Indeed  it 
may  be  said  that  the  state  of  employment  in  certain  trades 
is  affected  more  by  purely  local  variations  than  by  seasonal 
and  cyclical  fluctuations.     It  will  occasionally  happen  that 


56  STATISTICAL   METHODS 

in  a  particular  city  more  building  will  be  done  during  the 
winter  than  was  done  in  the  preceding  summer.  Even  taking 
the  labor  market  as  a  whole,  the  state  of  employment  varies 
as  much  from  one  city  to  another  as  it  does  from  one  sea- 
son to  another.  This  fact  is  shown  by  the  reports  of  the 
Massachusetts  Bureau  of  Statistics  on  the  state  of  employ- 
ment in  the  various  cities  of  the  State.  In  March,  1915, 
for  example,  the  percentage  of  unemployment  for  the  entire 
State  was  16.6 ;  in  Boston,  it  was  13.9 ;  in  Brockton,  27.6 ; 
in  Holyoke,  25.2 ;  in  Lowell,  7.4 ;  while  in  Quincy  and 
Taunton  it  was  only  4.1  and  4.7,  respectively.  Thus,  there 
was  a  total  range  of  23.5  from  one  city  to  another  in  the 
same  State.  The  reports  of  the  New  York  Department  of 
Labor  show  that  the  state  of  employment  is  generally  far 
worse  in  New  York  City  than  in  other  parts  of  the 
State.  .  .  . 

The  most  noticeable  characteristic  of  the  statistics  is 
the  wide  fluctuation  in  the  percentages  of  unemployment 
from  month  to  month.  In  the  New  York  data,  which 
constitutes  the  only  statistical  information  as  to  unem- 
ployment from  month  to  month  in  all  trades,  the  percent- 
ages for  all  trades  taken  together  gradually  dropped  from 
January,  the  dullest  month  in  the  year,  to  September  and 
October,  and  rose  again  in  November  and  December. 
The  good  and  bad  seasons  vary  from  one  trade  to  another. 
Thus,  the  winter  months  furnish  less  employment  in  building 
trades  and  transportation,  but  more  employment  in  cloth- 
ing, textiles,  boots  and  shoes,  theaters  and  music.  The 
differences  among  the  various  trades  of  the  same  industry 
are  equally  as  important.  For  instance,  in  the  garment 
industry,  the  dull  seasons  in  dresses  and  waists  coincide 
with  the  periods  of  fairly  intense  activity  in  the  manu- 
facture of  petticoats.      While  the  seasons  of  activity  and 


COLLECTION   OF  STATISTICAL   DATA  57 

dullness  may  be  in  general  the  same  in  some  of  the  various 
industries,  the  duration  and  the  intensity  of  the  unem- 
ployment may  be  different.  In  the  clothing  industry  the 
seasonal  fluctuations  are  the  greatest,  for  in  some  of  its 
trades  there  is  an  almost  complete  stagnation  in  the  dull 
season.  On  the  average,  it  may  be  said  that  the  dull  sea- 
son affects  80  per  cent  of  the  workmen  in  the  clothing  in- 
dustry. In  the  liuilding  trades  the  fluctuations  due  to 
weather  conditions  mean  the  idleness  of  20  per  cent  of  the 
workmen  in  addition  to  the  number  normally  idle.  In 
metals  and  machinery  and  printing,  the  seasonal  fluctua- 
tions are  less,  amounting  to  but  three  or  four  per  cent  of  the 
workmen.  In  the  brewing  industry  the  seasonal  fluctua- 
tions mean  the  employment  of  all  workers  on  half  time, 
while  in  theaters  about  75  per  cent  of  the  workmen  are 
unemployed  during  the  summer  months.  .  .  . 

It  is  a  well-recognized  fact  that  wages  are  higher  in  trades 
which  are  affected  by  pronounced  seasonal  fluctuations  than 
in  trades  embracing  the  same  class  of  workmen  but  with 
greater  regularity  of  employment.  Thus,  the  hourly  wages 
of  bricklayers  are  considerably  higher  than  the  wages  of 
carpenters;  but  the  statistics  of  the  New  York  Depart- 
ment of  Labor  show  that  the  average  yearly  earnings  in 
the  two  trades  are  about  the  same.  Cabinet  makers  re- 
ceive lower  wages  than  carpenters  partly,  if  not  entirely, 
because  they  have  more  regular  employment.  The  rel- 
atively high  daily  wages  of  members  of  building-trades 
unions  are  frequently  used  to  indicate  high  yearly  earn- 
ings, yet  it  is  found  that  the  latter  are  but  little  more  than 
those  in  metals  and  machinery  and  sHghtly  lower  than 
in  printing,  where  regular  employment  produces  high  yearly 
earnings  although  the  daily  wage  is  relatively  low. 


58  STATISTICAL  METHODS 


REVIEW 

1.  In  what  sense  or  senses  is  the  word  "unemployment"  used 
by  the  author?     By  the  different  collecting  agencies? 

2.  What  are  the  sources  of  statistics  on  unemployment  in  the 
United  States  according  to  the  author? 

3.  What  general  criticisms  from  a  statistical  point  of  view  are 
applicable  to  unemployment  statistics  collected  from  trade-union 
sources  in  New  York  and  Massachusetts? 

4.  What  statistical  success  have  American  trade  unions  had  in 
collecting  unemployment  statistics  concerning  their  own  members? 
Has  this  been  generally  true?  To  what  fundamental  condition  is 
this  due?  Can  the  conditions  be  changed  in  your  judgment? 
How? 

5.  From  the  unemployment  data  extant  what  are  the  most 
important  conclusions  which  may  be  drawn?  Would  you  think 
these  statistically  significant  in  view  of  the  nature  of  the  returns  ? 

6.  Do  the  major  fluctuations  in  employment  result  from  seasonal 
or  geographic  influences  ?     Can  your  answer  be  general  ?     Why  ? 

7.  What  supplementary  light  does  the  rate  of  wages  throw  on 
unemployment  ? 

8.  Contrast  "unemployment"  and  "fluctuations"  in  employ- 
ment. From  what  points  of  view  might  they  be  used  as  equivalent 
in  meaning?  When  would  it  be  necessary  to  discriminate  between 
them  ? 

9.  Consult  the  most  recent  of  any  of  the  Reports  on  Unemploy- 
ment to  which  reference  is  made  by  Dr.  Smelser.  Are  the  data 
given  in  tabular  or  graphic  form  ? 

(1)  Is  unemployment,  as  used,  defined;  are  the  conditions  to 
which  the  data  refer  clearly  indicated ;  are  you  in  doubt  in  any  way 
respecting  the  significance  of  the  data  either  absolutely  or  com- 
paratively?    In  what  respects? 

(2)  To  whom  would  the  data  seem  to  be  of  interest  ?  For  whom 
were  they  prepared? 


COLLECTION  OF  STATISTICAL  DATA 


59 


M 
a 
0 


■W    +3 


(_     Ui     ^     h<     tH     t< 

-4J    -tJ    -W    V>    -*J    ■♦J 

n    cn    cn    m    n    n 
3    3    3    3    3    3 

'^     "U    '^     '^    '^    '^ 

3    0    3    3    3    3  a> 

m    to    m    ^i-Hi— II— iMi— (I— 1    ^    ^*3    I.. 

ji(^^^   g  3  a   =   3  3-^^i;'3ii:i3 
>S  «  v2  ^  fl  "fl  'a  '5  ;o  3  ^  ^  ^  -E  :2  ■ 

«,V-.V,^U-.>«>*->»-.V-l<»-..*.>»H<«     —     •<     —     ' 

oooooooooooog^?^ 

333333333333  -^O- 

oj   0)   a;   oj   o  a)  q)  o)   oj   oi   c*   o^'r;'*:;'^^^^^^^ 


d  a 


^J  -^  +_-  -M  -«J 

CD  tn  en  ro  CO 

}  xi  Xi  Xi  Xi  j:i  Si 

c3    c^  o3  ^  oj  c3 

O     O  O  C  O  CJ 

+J    '+J  '-*->  '-♦->  '•♦-S  -fj 

tn    71  (n  (n  X  en 


<  ^        ';3 


,0  m  tn 

S   S   " 

03    CJ    O    QJ 

•-,  o  u  -^ 


E  a 

3    3 

i-;hJ 

>->     -M 

(3    03 

:z;2 


03 
55  <j  »j  <5 

*?<*-<*-.    M-l 

CO    o    o    o 


2  3  3 

3  03  C3 
^  V  V 
B  b>  ba 

-  3 


3    3 

Cj  PQ  P5  W 


<  i-t 


t^t^t^i^i^t^r-ooccoot~t^ooooooooo(»ooooco 

r-li— IT-I.— I.-Hf— I»-Ht— li-Ht— ti— li— l'.^T-(CDOOOC3OC0 

05030  0:0000:0000000000)0000000000 


O  t^  t^  00 

O    -H    -H    -( 

00  o  o  o 


a 
o 

3 

Ph 


J3J3^jaj3j3j3j3j3j3j3ja 

BBBOBBBOBBBB 
OOOOOOOOOOOO 


(-•     Lri      Ih      Lh      Ui      Lh 


1^;,^;  ,2ii 


ID     3) 


-3  J3  ja 

-^     -^     -4^ 

b  3  a  0 
g  o  c  o 


a! 
H 

M 

K 
H 
CB 

O 

Z 
O 

P. 
►-4 

3 
O 
m 

p 


9)  a 

to  tD 
03    C3 

o  o 

3    3 

.2  .2 
o  o  j> 

3  3  M 
O  O  o 
Ph  Ph  W 


u 


■a-o 


5f  S 

*-■  ^ 

3-        -   - 
5  IB  tB  M 


0;   o 

-3  J 
to  M 
3    3 


03 


d    03 

0)   a; 


'+3   ^ 

■^  E 


03    03   0)   0.^   o   o 

■'  "  S  3  S  £ 


O 


3    0 
O    O 


3    3 
O    O 


g 


J3 


0)     0) 

S    S 
3    3 


"d  '^  TJ  o  lu 
o   a   V   ts>  tti  p   p 


m   v^   m 


ir_._.3333333  3™ 

J^fJooOOOOOOOOO  OT3 

Q  P  M  CC  P4  Ph  Ph  Ph  &<  Ph  Ph  pLi  Ph       Ph  O 


o 
-d 


O  0 

>      'ft 

§€  §^ 

o    3   0)    S   aj   E 
3   3  J3   3  ^   O 

-O    S    B    3    £    ° 
P    g    3    "    - 


^    -  -,  a  g  J 


S 
t3 


6h 
o 

p 

Q 
O 

M 
Ph 
iJ 
< 
3 

5 

o 

s 

o 
< 


-dT) 


c    1^    c> 

'0^43ja'^"0-dT3T)'S  — 


a>   ^   OJ   Q>   a)   CJ  'd 


3affjB3aBSB333^£^^^^^^B 
HH33333""      ______ 


3   3^3;^^^-     ____     -„ 
OOo3o£2^000003o3B333B30 

(i,PMUPH:z;2;^PHCLieL,pmii«fqeQp3pqcqpq«CL, 


en    tn    tn    73    m    m    3 
3    3    3    3    3    3    0 


Ih  U 

^  T)  ^ 
03    03    3    o  5 


!ti  ti 

off) 


a » s « 


—  J<i  _M  -73 


3 

o 


03    03 
(1> 


03 
2   m 


.3    MJi    O3o3tea)a>00o3rrr^0o3--03 

:jwoooMm>pi^p^j^^oopHfQ 


-SB 

3    o 


0)         3 
3  O 

(n        -tj  -*^  JX 


^       Oj      G 


O  ■  -    03 

e-i 


«       -.5 

u  .a  _  -^ 


60 


STATISTICAL  METHODS 


m 

Q 

m 
P 


— .   a 

CO     t^ 

■  a 


3    3 


9    tu    0) 
g  WW 


xfi     m 

0       a 


O 

a 

u 
O 

O 


cn    in   tn   tn 

§  §  §  § 


CO    <<^     C3 

.2  -^s  .S 

(h    in   fH 
O)  d  ID 

e«  s 

-<    < 


3    » 

m 
0  «} 


^    ^    oj 
(u   cj   a; 


:3   3   3   3 


a  a 

ro   tn 

m    in 

Jli 

<< 

O  o  o 

-  C3  OS 

ce  3  3 

«  CI  a 

fj  ^  o3 


a  a  d  0 
o  o  o  o 


6  E  S  S 

6  e  s  a 

o  o  o  o 
OUOU 

OJ  (D  0)  O 
T3  'C  "TJ  'O 
oj    ^    oj    ^ 

I-.         (h         ^         tH 

T3  '^  'O  Td 

o  lu  a>  v 

(i,  fe  fe  fa 


.■a 
V.< 

.02 


J    O 

5  2 

02 


1^  t^ 

03  cn 


CC  T-H  O 

00        o        ^ 
00        ca        C: 


t^  X  00  00 

^H      I— I     f-l     T-l 

Ci  Gi  Oi  Oi 


Oi  1X1  tXi 
1— (  1— 1  »-( 

05   Oi  05 


00  00  00  00 

.-H    i-H    ^H    .-I 

OS   05  03   Ol 


00  00 
*-l    l-H 


o 
o 

s 

(in 


«    V  O    lU 

lu  u       V  a> 


^^       ^^       §       S       § 


j3  ja  ja  j3 

.*i    ^J    +J    4^ 

c  a  a  a 
o  o  o  o 

*rH  irrH  fcH  M 
l<  <S  f«  2 


C    C    C    0 

o  o  o  o 


a  a 
o  o 


§s§§    §§ 


» 

b 
O 

O 
b 

K 
u 
m 

Q 


j^ 

Q 

02 

a  a 

3  d 

3 

3 

o  o 

0  o 

o 

o 

-f^  -^ 

o  o 

o  o 

o 

CJ 

3   3 

3   3 

3 

3 

tJTS 

•OT) 

Ti 

•B 

P   P 

0   o 

o 

O 

pupL,     fL,  a.     pu 


«3 

3 


n   tn   n 

«  s  s  s 

_i^  o  o  o 
-^  *j  *^  +j 

o  w  w  w 

4J     .     »     » 

02  -*^  -»J  -*J 
.333 

d   o.  &.  O. 

s  +^  +J  +-) 
.2333 

-gooo 

'rt  rf  o3  c3 
Q  ^  +j  -*^ 
li    O    O    O 


T3 
3 
C3 

o  a 
■^  o 


B 

c 

3    3    3 
O    O    O 


02 


I    °    O 

«eL,  w 


+J    -M    -♦-    4J 

o   o  o   o 

3    3  3    3 

TJ  'B  "B  "C 

O    O  O    O 

t.1    t.1  t^    Lh 

(1,  &4  Ph  Ph 


3    3 

o,  a 

3   3 
OO 


S 

t3 


TO  m 
3  a 
O    O 


X 


X     IK 

a  a       a 

0   0         o 


3 
O 

H 


o 
H 


T 

m 

rr 

m 

Q) 

o 

o 

Ih 

u> 

c3 

d 

Cl 

a 

CQ  P3  CQ  CQ  1 

o  a;  m 
^  X!  -O 
6  S  § 
3  3  g 
^  ISPl, 


en   m   en   CO 

a  a  a  a 
o  o  o  o 


s  s 

3    3 


H 
V 
P 
Q 
O 
K 
PLh 

» 


3   lu 
o.-S 

a  o 

"e  - 


CO  a 

0)    O.   S  I— I 

o  o  .^ 
OO  § 


o 

£        «3 


O      -o 

o 


e  ^  a 


bC 


W    w 


0)  :s   m 
•CO" 


0^   c 


-    „    -    t-  J3 
.    O    M    C3    «    3 


.2  § 

a  a 


u 

a 


■*«  ►^-i  -i  -^ 
e  IB  W  02 


3 

03  a 
^  Cm  03 

rv  0)  —    00 
s^  -ir  D."3  -*^ 

g.pL.  Pn^fc 

Oh 


it;  .^ 


COLLECTION   OF  STATISTICAL   DATA 


61 


M 

Q 

00 


M  > 

t3CQ 


CD     0)     0) 

3    3    3 

pea 

O     1,     Sj 

S^   >  ^ 

O    Oi    a 

««« 

P.  P  a 


0  a  a 


o  o  o 


s  s  a 

o  o  o 


"SO 

^  a 
^5 


«-       eg 
W        CO 


a 
a 

3 


03 

a 
a 

3 

02 


J3 


03 


03 

•a 


3 
EH) 


0) 

3 


o 
P. 


J  o 
02 


1-H   1—1 

OS  o 


.-I  ^ 

OS  OS 


Q 
O 

M 

K 
PL, 


j3  ^      ja  j3  ja 


a  o 
o  o 


a  a 

o  o 


;  <  Pi 


j3  ja 

fl  la 
o  o 


ja  ja 

a  a 
o  o 


j3  ja 
+j  +-• 
a  3 
o  o 


00 

» 

CO 

O 

z 
o 

P< 

M 
» 

u 

DO 

H 

Q 


01 

q;    CD 
ja  ^ 

3 

a  2; 
1  = 

se 

Pl,0 


-a 

3 
C3 

4) 
S3 


3    3    3 
O    O    O 


3  3  3 
n3  13  -^ 
O  O  O 
t-<    u    ;h 

PhPh  Hh 


O 

•a   . 

OLD 


3    m 

o  H 
PQ  S 


m 
H 
U 

P 
Q 

O 
K 

a 

K 

p 

o 

«! 
Ek 
P 

z 


§  a 

HZ 


oj  a>   m 

a  a  § 

3    3    g 


3    Q.   p,  tj    o3    03 


O    fc-    t-    > 
t^    rf    rf    r 


n.^  ;9  S  r 


^    bC  U3 


oaw-^OOQ 


ft 

o 

O 


01 

ft 


a 


as 


I: 


O    w    H 

L-       -*      " 


;=5:2;    Hi^^ 


o 


J3 


62  STATISTICAL   METHODS 

REVIEW 

1.  Consult  any  one  of  the  series  listed  above,  and  determine,  if 
possible, 

(1)  the  definition  of  the  unit  used. 

(2)  the  source  of  the  data  published. 

(3)  the  method  by  which  the  data  are  collected. 

(4)  the  nature  of  the  critical  comments  supplied  with  the  data. 

(5)  the  apparent  purpose  which  the  data  are  to  serve. 

(6)  the  consumer  to  which  they  are  addressed. 

2.  In  what  way,  if  at  all,  could  these  series  of  data  be  of  use,  for 

(1)  planning  internal  manufacturing  problems? 

(2)  measuring  market  trends? 

(3)  measuring  industrial  growth? 

(4)  helping  to  indicate  or  solve  employee-employer  relations  ? 

Sampling  of  Coal  ^ 

The  standard  specifications  require  that  a  sample  of  each 
delivery  of  over  twenty-five  tons  of  coal  be  analyzed  to  de- 
termine its  quality  and  the  acceptability  of  the  shipment. 
The  important  feature  of  sampling  is  to  secure  a  quantity 
representative  of  the  coal  delivered. 

Sampling  in  the  field  at  the  point  of  deUvery  shall  be 
done  under  the  control  and  supervision  of  the  borough 
engineer  of  the  borough  within  whose  limits  the  delivery 
is  to  be  made.  When  the  sample  is  taken  from  a  pile,  boat, 
or  car,  care  must  be  taken  to  secure  it  from  various  parts 
and  in  the  same  amounts  from  the  top,  the  middle,  and  the 
bottom.  When  coal  is  unloaded  by  conveyor,  samples 
shall  be  taken  by  hand  or  mechanical  means  from  the  mov- 
ing mass  at  regular  intervals. 

'  Adapted  with  permission  from  Bulletin  No.  2,  Bureau  of  Economy  and 
Efficiency.  City  of  New  York,  Department  of  Water  Supply,  Gas  and  Elec- 
tricity, pp.  27-29. 


COLLECTION  OF  STATISTICAL  DATA  63 

The  gross  sample  must  contain  the  same  proportion 
of  lump  and  fine  coal  as  exists  in  the  whole  shipment.  In 
order  to  avoid  gain  or  loss  in  moisture  samples  are  pro- 
tected from  the  weather  by  being  placed  in  a  covered  re- 
ceptacle until  the  gross  sample  can  be  quartered  down 
and  sent  to  the  laboratory.  The  size  of  the  gross  samples 
to  be  taken  depends  upon  the  size  of  the  delivery.  The 
standard  specifications  provide  that : 

For  deliveries  over  25  tons  and  less  than  100  tons  the  gross 
sample  is  200  pounds.^ 

For  deliveries  over  100  tons  the  sample  is  approximately  one  ton 
in  each  thousand  tons  (except  where  otherwise  specifically  provided)  .^ 

If  the  sample  of  coal  is  larger  than  the  pea  size  it  is  broken 
down  by  hand  or  by  passing  through  a  crusher  to  approxi- 
mate the  size  of  pea  (which  passes  through  a  f-inch  square 
mesh  and  over  a  ^-inch  square  mesh). 

After  being  reduced  to  a  standard  size  the  gross  sample 
shall  be  thoroughly  mixed  by  shoveling  it  over  and  over, 
and  is  then  formed  into  a  conical  pile  by  shoveling  the 
coal  from  the  edges.  When  the  cone  is  completed  it  shall 
be  cut  in  half  vertically  by  passing  a  piece  of  sheet-iron 
down  through  the  center  and  see-sawing  it  until  it  strikes 
the  floor.  The  two  halves  shall  then  be  separated  by  hold- 
ing the  iron  plate  firmly  vertical  and  moving  either  half  of 
the  cone  about  one  foot  away.  The  iron  plate  shall  then 
be  set  at  right  angles  to  its  first  position  and  the  cone  di- 
vided into  quarters.  Two  diagonally  opposite  quarters 
are  rejected.  In  the  two  remaining  quarters  the  larger 
lumps  are  broken  down  to  ^  inch  or  smaller.  The  two 
quarters  are  then  thoroughly  mixed,  formed  into  a  coni- 
cal pile,  and  quartered  as  before.     The  operation  of  break- 

1  Sample  of  shipment  less  than  25  tons  shall  not  be  taken. 


64  STATISTICAL   METHODS 

ing  down,  mixing,  and  quartering  is  to  be  continued  until 
the  sample  has  been  reduced  to  about  5  pounds  and  to 
i-inch  size  or  smaller. 

The  sample  shall  be  worked  down  as  rapidly  as  possible 
to  avoid  change  in  the  percentage  of  moisture  through 
exposure  to  the  air. 

REVIEW 

1.  In  what  respects  does  the  analogy  between  sampling  as  a 
process  in  coal  selection  and  sampling  as  a  statistical  device  for 
characterizing  a  labor  force,  for  instance,  seem  to  be  complete? 
In  what  ways  imperfect?  Would  you  purchase  labor  on  the  basis 
of  samples  ?     Is  it  done  ? 

2.  Generalizing  on  your  answers  to  the  question  above,  formu- 
late in  writing  a  general  statement  of  the  conditions  to  be  observed 
in  statistical  sampling. 

3.  What  would  you  say,  from  the  point  of  \'iew  of  sampling, 
about  figures  purporting  to  show  the  average  depth  of  spring  and 
fall  plowing  in  a  number  of  States,  or  the  statement  that  "  in  Illinois 
fall  plowing  is  deeper  than  spring  plowing,  whereas  in  Indiana,  the 
reverse  is  true.  .  .  ."?i 


Government  Crop  Reports  - 

The  practical  value  of  the  Government  crop  estimates 
results  from  the  fact  that  they  are  based  upon  reports  of 
farmers  and  others  in  every  county  and  township  in  the 
United  States  and  upon  reports  of  trained  field  agents  in 
each  State ;  they  are  made  monthly  during  the  crop  sea- 
son ;  they  are  checked  up  from  every  possible  source  of 
information;  the  final  reports  are  prepared  and  issued  by 
a  crop-reporting  board   of   experts;    and   all   Government 

1  Monthly  Crop  Report,  February,  1918,  p.  17. 

2  Adapted  with  permission  from  "Government  Crop  Reports:  their 
Value,  Scope,  and  Preparation,"  United  States  Department  of  Agriculture, 
Bureau  of  Crop  Estimates,  Circular  17.     Revised,  pp.  8-26. 


COLLECTION   OF  STATISTICAL   DATA  65 

employees  engaged  in  the  preparation  of  the  crop  estimates 
are  prohibited  by  law  from  giving  out  information  con- 
cerning them  or  in  utiUzing  information  so  obtained  for  their 
own  benefit  directly  or  indirectly  prior  to  the  date  and 
hour  of  publication,  so  that  the  reports  when  issued  are 
known  to  be  as  accurate  as  it  is  practicable  to  make  them, 
as  well  as  impartial,  disinterested,  and  therefore  dependable. 
No  public  organization,  and  certainly  no  private  corpora- 
tion in  the  United  States  and  proba})ly  in  the  world,  is  so 
well  organized  and  equipped  for  the  work  of  reporting  on 
crop  conditions  and  prospects  as  the  present  Bureau  of 
Crop  Estimates. 

Without  such  a  system  of  Government  crop  estimates, 
speculators  interested  in  raising  or  lowering  prices  of  farm 
products  would  issue  so  many  conflicting  and  misleading 
reports  that  it  would  be  practically  impossible  for  any  one, 
without  great  expense,  to  form  an  accurate  estimate  of  crop 
conditions  and  prospects.  Farmers  would  suffer  most 
from  such  conditions,  because  they  are  not  so  well  organ- 
ized as  other  lines  of  business  nor  are  they  in  a  position 
to  take  advantage  of  fluctuations  in  market  prices. 

Farmers  are  benefited  by  the  Government  crop  reports 
both  directly  and  indirectl}^ ;  directly,  by  being  kept  in- 
formed of  crop  prospects  and  prices  outside  of  their  owti 
immediate  districts,  and  indirectly,  because  the  disin- 
terested reports  of  the  Government  tend  to  prevent  the 
circulation  of  false  or  misleading  reports  by  speculators 
who  are  interested  in  controlling  or  manipulating  prices. 

The  farmer  cannot,  by  refusing  to  report  the  condi- 
tion of  crops  for  his  locality,  prevent  buyers  and  specu- 
lators from  knowing  the  condition  of  the  crop.  It  is  well 
known  that  speculators  and  large  dealers  in  farm  prod- 
ucts   do    not    depend    entirely    upon    Government    reports 


66  STATISTICAL   METHODS 

for  information  concerning  crop  prospects.  They  main- 
tain regular  systems  of  their  own  for  collecting  crop  in- 
formation. They  have  traveling  agents  and  correspondents 
(usually  local  buyers)  throughout  the  United  States  who 
keep  them  posted,  and  the  large  buyer  or  speculator,  in 
return,  gives  these  local  buyers  or  correspondents  infor- 
mation in  regard  to  general  conditions  and  prices.  The 
local  buyers  know  the  conditions  of  crops  in  their  own  vi- 
cinity better,  as  a  rule,  than  the  average  farmer,  because 
it  is  their  business  to  keep  well  informed. 

If  the  Government  crop  estimates  should  be  discontinued, 
the  farmer  would  have  no  rehable  information  concerning 
crop  prospects  except  in  his  own  immediate  neighborhood, 
and  for  crop  prospects  in  other  localities  he  would  have  to  de- 
pend upon  such  information  as  interested  spectators  and 
dealers  might  choose  to  publish  in  the  newspapers,  which 
might  or  might  not  be  correct.  Prices  in  his  own  local  market 
are  influenced,  as  a  rule,  more  by  the  condition  of  the  whole 
crop  throughout  the  State  or  the  United  States,  and  even  in 
foreign  countries,  than  they  are  by  local  conditions.  The 
entire  wheat  crop  of  his  county  may  be  destroyed  and  yet 
prices  may  be  low,  or  his  county  may  have  a  bumper  crop 
and  prices  be  unusually  high,  depending  upon  whether  or 
not  there  is  a  surplus  or  deficiency  in  the  entire  crop  else- 
where. In  a  sense  the  Bureau  of  Crop  Estimates  is  a  form 
of  farmers'  cooperation,  wherein  each  farm  crop  reporter 
gives  information  about  his  locaUty  and  in  return  receives 
information  about  the  entire  country,  the  bureau  merely 
acting  as  a  clearing  house  for  such  cooperative  exchange. 

Some  of  the  private  crop  reports  which  are  published 
in  the  newspapers  are  honestly  prepared  and  are  more  or 
less  accurate,  depending  upon  the  extent  and  sources  of 
information;    on  the  other  hand,  misleading  crop  reports 


COLLECTION  OF  STATISTICAL  DATA  67 

are  known  to  be  frequently  circulated  in  order  to  raise  or 
lower  prices  in  the  interest  of  speculators.  If  the  farmer 
reads  the  crop  estimates  and  forecasts  of  the  Government 
as  they  are  issued,  he  will  be  in  a  position  to  judge  for  him- 
self what  the  crop  prospects  are,  as  well  as  probable  prices, 
so  that  he  can  decide  intelligently  how  to  market  his  prod- 
uce and  how  to  deal  with  the  local  buyers.  Even  the 
farmers  who  do  not  keep  posted  are  indirectly  benefited 
by  the  publication  of  Government  crop  estimates,  be- 
cause these  estimates  automatically  tend  to  check  and  lessen 
the  injurious  effects  of  false  reports  sent  out  broadcast 
by  interested  speculators  and  their  agents  in  the  same  way 
that  a  police  or  constable  force  tends  to  check  but  not  en- 
tirely prevent  crime  in  a  community. 

The  more  certainty  there  is  as  to  the  probable  supply 
and  demand  the  less  chance  for  speculation  and  loss  in 
the  business  of  distributing  and  marketing  the  crop,  which 
is  a  benefit  both  to  the  producer  and  to  the  consumer. 

Large  manufacturing  firms,  agricultural  implement  and 
hardware  companies,  who  neither  buy  nor  sell  farm  prod- 
ucts, are  much  interested  in  crop  prospects.  This  knowl- 
edge enables  them  to  distribute  their  wares  economically, 
sending  much  to  sections  where  crops  are  good  and  farmers 
have  money  with  which  to  buy,  and  less  to  sections  where 
crops  are  short  and  farmers  will  have  less  to  spend.  Few 
farmers  realize  how  much  is  saved  by  an  even  distribution 
of  manufactured  articles  according  to  crop  prospects.  If 
manufacturers  avoid  heavy  losses  from  improper  distri- 
bution, they  can  afford  to  s^ell  on  better  terms,  with  re- 
sulting benefit  to  farmers. 

The  railroads  of  the  country,  which  move  crops  from 
the  farm  to  the  market,  must  know  in  advance  the  prob- 
able size  of  the  crop  in  order  to  provide  a  sufficient  number 


58  STATISTICAL  METHODS 

of  cars  to  handle  it  effectively  and  without  delay.  Cases 
are  not  infrequent  when  prices  of  grain  at  railroad  sta- 
tions are  reduced,  or  there  is  absolutely  no  sale  for  the  grain 
because  cars  are  not  available  for  shipping,  the  farmer 
thus  being  among  the  sufferers. 

Prompt  and  reliable  information  regarding  crop  pros- 
pects is  equally  important  and  valuable  in  the  conduct  of 
commercial,  industrial,  and  transportation  enterprises. 
The  earher  the  information  regarding  the  probable  pro- 
duction of  the  great  agricultural  commodities  can  be  pub- 
Ushed,  the  more  safely  and  economically  can  the  business 
of  the  country  be  managed  from  year  to  year. 

Retail  dealers  in  all  hnes  of  goods,  whether  in  the  city 
or  in  the  country,  order  from  wholesale  merchants,  jobbers, 
or  manufacturers,  the  goods  they  expect  to  sell  many  weeks 
and  frequently  many  months  before  actual  purchase  and 
shipment.  Jobbers  follow  the  same  course,  and  manu- 
facturers produce  the  goods  and  wares  handled  by  merchants 
of  every  class  far  ahead  of  the  time  of  their  actual  distri- 
bution and  consumption.  It  is  therefore  important  that 
they  have  the  earliest  information  possible  with  respect 
to  crop  prospects  and  the  probable  purchasing  power  of 
the  farmers. 

With  such  information  carefully  and  scientifically  gath- 
ered and  compiled,  and  honestly  disseminated,  so  that  it 
can  be  depended  upon  to  be  as  accurate  as  any  forecast 
or  estimate  can  possibly  be,  and  reUed  upon  as  emanating 
from  an  impartial  and  disinterested  source,  the  farmers, 
the  merchants,  the  manufacturers,  and  the  transporta- 
tion and  distributing  agencies  of  the  country  can  act  with  a 
degi-ee  of  prudence  and  intelUgence  not  possible  were  the 
information  lacking. 


COLLECTION   OF  STATISTICAL   DATA  69 

Scope  of  Crop  Reports 

Beginning  with  planting,  data  are  gathered  and  reports 
made  as  to  the  condition  and  acreage  of  each  of  the  prin- 
cipal agricultural  products,  such  as  corn,  wheat,  oats,  rye, 
barley,  potatoes,  hay,  cotton,  tobacco,  rice,  etc.  As  the 
crops  progress  the  prospects  are  reflected  in  monthly  condi- 
tion reports  upon  each  growing  crop;  such  reports  being 
expressed  in  percentages,  100  representing  a  normal  con- 
dition. Condition  reports,  expressed  in  percentages  of  a 
normal,  when  published,  are  coupled  with  a  statement  of 
the  averages  of  similar  reports  at  corresponding  dates  in 
preceding  years  (usually  10-year  averages) ;  by  such  com- 
parison the  condition  of  crops  in  comparison  with  the 
average  condition  is  readily  obtained.  At  harvest  time 
the  yields  per  acre  are  ascertained,  which,  being  multiplied 
by  the  acreage  figures  already  ascertained,  give  the  pro- 
duction. ... 

Methods  of  Crop  Reporting 

The  reports  issued  by  the  Bureau  of  Crop  Estimates  dur- 
ing the  year  include  data  relating  to  acreages,  conditions, 
yields,  suppKes,  quahties,  and  values  of  farm  crops,  num- 
bers by  classes,  condition,  and  values  of  farm  animals, 
etc.  The  data  upon  which  such  estimates  are  based  are 
obtained  through  a  field  service  consisting  of  a  corps  of  paid 
State  field  agents  and  crop  specialists  and  a  large  body  of 
voluntary  crop  reporters  composed  of  the  following  classes : 
county  reporters,  township  reporters,  individual  farmers, 
and  several  lists  of  reporters  for  special  inquiries. 

The  field  service  consists  of  trained  field  agents,  one 
assigned  to  a  single  State  or  group  of  smaller  States  which 
in  the  aggregate  corresponds  in  area  and  crop  production 


70  STATISTICAL   METHODS 

to  one  of  the  larger  States,  who  devote  their  entire  time 
to  the  work  and  who  travel  throughout  their  territory  dur- 
ing the  crop  season,  personally  inspecting  crop  areas,  con- 
ferring with  State  and  local  authorities,  private  and  com- 
mercial agencies,  and  others  interested  in  crop-reporting 
work.  Each  agent  supplements  his  own  observation  with 
reports  from  a  corps  of  selected  crop  reporters  in  his  terri- 
tory, who  report  directly  to  him  and  are  wholly  independent 
of  the  regular  crop  reporters  who  report  directly  to  the 
bureau. 

In  addition  to  the  regular  force  of  State  field  agents  the 
bureau  has  a  small  force  of  crop  speciaUsts,  one  or  more 
for  each  of  the  important  special  crops,  such  as  cotton, 
tobacco,  rice,  and  truck  crops,  possessing  the  same  quali- 
fications and  performing  the  same  duties  as  the  field  agents, 
but  devoting  their  entire  time  to  specializing  on  the  par- 
ticular crops  to  which  they  are  assigned  and  traveling 
throughout  the  entire  region  in  which  they  are  grown. 
These  crop  specialists  also  have  selected  lists  of  crop  cor- 
respondents reporting  directly  to  them. 

Both  the  State  field  agents  and  the  crop  specialists  are 
in  the  classified  service  and  are  appointed  only  upon  certifi- 
cation by  the  Civil  Service  Commission  after  a  rigid  com- 
petitive examination.  They  are  selected  for  their  special 
training  and  qualifications  for  the  work  and,  as  they  ac- 
quire knowledge  and  experience,  will  become  recognized 
authorities  in  crop  production  in  each  State. 

There  are  approximately  2800  counties  of  agricultural 
importance  in  the  United  States.  In  each  the  depart- 
ment has  a  principal  county  reporter  who  maintains  an 
organization  of  several  assistants.  These  county  reporters 
are  selected  with  special  reference  to  their  qualifications 
and  constitute  an  efficient  branch  of  the  crop-reporting  serv- 


COLLECTION   OF  STATISTICAL   DATA  71 

ice.  They  make  the  county  the  geographical  unit  of  their 
reports,  and,  after  obtaining  data  each  month  from  their 
assistants  and  supplementing  these  with  information  ob- 
tained from  their  own  observation  and  knowledge,  report 
directly  to  the  department  at  Washington. 

In  practically  all  of  the  townships  and  voting  precincts 
of  the  United  States  in  which  farming  operations  are  ex- 
tensively carried  on  the  department  has  ''township"  re- 
porters who  make  their  immediate  neighborhood  area  with 
which  they  are  personally  familiar  the  geographical  basis 
of  reports,  which  they  also  send  directly  to  the  department 
each  month.     There  are  about  32,000  township  reporters. 

Finally,  at  the  end  of  the  growing  season  a  large  num- 
ber of  individual  farmers  and  planters  report  on  the  re- 
sults of  their  own  individual  farming  operations  during  the 
year ;  valuable  data  are  also  secured  from  30,000  mills  and 
elevators. 

Because  of  the  specialized  nature  of  the  cotton  crop  the 
reports  concerning  it  are  handled  separately  from  reports 
on  all  other  crops.  In  addition  to  the  regular  estimates 
of  the  State  agents,  the  cotton  crop  specialist,  and  the 
county  and  township  reporters,  the  bureau  obtain  reports 
on  acreage,  yields,  percentage  ginned,  etc.,  from  many 
thousand  special  reporters  who  are  intimately  concerned 
in  the  crop,  including  practically  all  the  ginners. 

Transmission  of  Reports  to  Bureau  by 
Correspondents 

Previous  to  the  preparation  and  issuance  of  the  bureau's 
reports  each  month  the  correspondents  of  the  several  classes 
send  their  reports  separately  and  independently  to  the  de- 
partment at  Washington. 


72  STATISTICAL  METHODS 

In  order  to  prevent  any  possible  access  to  reports  which 
relate  to  speculative  crops,  and  to  render  it  absolutely  im- 
possible for  premature  information  to  be  derived  from 
them,  all  of  the  reports  from  the  State  field  agents,  as  well 
as  those  from  the  crop  speciahsts,  are  sent  to  the  Secretary 
of  Agriculture  in  specially  prepared  envelopes.  By  an  ar- 
rangement with  the  postal  authorities  these  envelopes  are 
deUvered  to  the  Secretary  of  Agriculture  in  sealed  mail 
pouches.  These  pouches  are  opened  only  by  the  Secretary 
or  Assistant  Secretary,  and  the  reports,  with  seals  unbroken, 
are  immediately  placed  in  a  safe  in  the  Secretary's  office, 
where  they  remain  sealed  until  the  morning  of  the  day  on 
which  the  bureau  report  is  issued,  when  they  are  dehvered 
to  the  statistician  by  the  Secretary  or  the  Assistant  Secre- 
tary. The  combination  for  opening  the  safe  in  which  such 
documents  are  kept  is  known  only  to  the  Secretary  and  the 
Assistant  Secretary  of  Agriculture.  Reports  from  field 
agents  and  crop  specialists  residing  at  points  more  than 
500  miles  from  Washington  are  sent  by  telegraph,  in  cipher. 
The  reports  from  the  county  correspondents,  township 
correspondents,  and  other  voluntary  crop  reporters  are 
sent  to  the  Chief  of  the  Bureau  of  Crop  Estimates  by  mail 
in  sealed  envelopes. 

Preparation  of  Reports 

The  reports  received  by  the  department  from  the  dif- 
ferent classes  of  individual  correspondents  are  tabulated 
and  compiled  and  the  figures  for  each  separate  State  com- 
puted. After  the  reports  from  the  different  counties  are 
tabulated,  a  true  weighted  figure  for  the  State  is  secured 
by  taking  into  consideration  the  relative  value  which  the 
total  acreage  or  production  of  each  county  in  the  State 


COLLECTION   OF  STATISTICAL  DATA  73 

bears  to  the  total  acreage  or  production  of  the  State.  The 
weight  figure  showing  the  vahie  of  the  county  is  apphed 
to  the  acreage,  yield  per  acre,  or  condition,  whichever  it 
may  be,  and  from  the  totals  of  the  weights  and  the  ex- 
tensions a  weighted  average  for  the  State  is  ascertained. 
The  averages  for  speculative  crops  (corn,  wheat,  oats,  and 
cotton)  are  determined  by  computers  who  do  not  know 
the  particular  State  to  which  their  figures  relate. 

The  work  of  making  the  final  crop  estimates  each  month 
culminates  at  sessions  of  the  crop-reporting  board,  com- 
posed of  five  members,  presided  over  by  the  statistician 
and  chief  of  bureau  as  chairman,  whose  services  are  brought 
into  requisition  each  crop-reporting  day  from  among  stat- 
isticians and  officials  of  the  bureau,  and  field  agents  and  crop 
speciahsts  who  are  called  to  Washington  for  the  purpose. 

The  personnel  of  the  board  is  changed  each  month.  The 
meetings  are  held  in  the  office  of  the  statistician,  which 
is  kept  locked  during  sessions,  no  one  being  allowed  to  enter 
or  leave  the  room  or  the  bureau,  and  all  telephones  being 
disconnected. 

When  the  board  has  assembled,  reports  and  telegrams 
regarding  speculative  crops  from  field  agents  and  crop 
speciahsts,  which  have  been  placed  unopened  in  a  safe  in  the 
office  of  the  Secretary  of  Agriculture,  are  dehvered  by  the 
Secretary,  opened,  and  tabulated;  and  the  figures,  by 
States,  from  the  several  classes  of  correspondents  and 
agents  relating  to  all  crops  dealt  with  are  tabulated  in 
convenient  parallel  columns;  the  board  is  thus  provided 
with  several  separate  estimates  covering  each  State  and 
each  separate  crop,  made  independently  by  the  respective 
classes  of  correspondents  and  agents  of  the  bureau,  each 
reporting  for  a  territory  or  geographical  unit  with  which  he 
is  thoroughly  familiar. 


74  STATISTICAL   METHODS 

Abstracts  of  the  weather  condition  reports  in  relation 
to  the  different  crops,  by  States,  are  also  prepared  from  the 
weekly  bulletins  of  the  Weather  Bureau.  With  all  these 
data  before  the  board,  each  individual  member  computes 
independently,  on  a  separate  sheet  or  final  computation 
slip,  his  own  estimate  of  the  acreage,  condition,  or  yield 
of  each  crop,  or  of  the  number,  condition,  etc.,  of  farm 
animals,  for  each  State  separately.  These  results  are  then 
compared  and  discussed  by  the  board  under  the  super- 
vision of  the  chairman,  and  the  final  figures  for  each  State 
are  decided  upon. 

The  estimates  by  States  as  finally  determined  by  the 
board  are  weighted  by  acreage  or  other  figures  representing 
the  relative  importance  of  the  crop  in  the  respective  States, 
the  result  for  the  United  States  being  a  true  weighted 
average  for  each  subject. 

Method  of  Issuing  Reports 

Reports  in  relation  to  cotton,  after  being  prepared  by 
the  crop-reporting  board  and  personally  approved  by  the 
Secretary  of  Agriculture,  are  issued  on  or  about  the  first  day 
of  each  month  during  the  growing  season,  and  reports  re- 
lating to  the  principal  farm  crops  and  Hve  stock  about 
the  seventh  or  eighth  day  of  each  month.  In  order  that 
the  information  contained  in  these  reports  may  be  made 
available  simultaneously  throughout  the  entire  United 
States,  they  are  handed,  at  an  announced  hour  on  report 
days,  to  all  applicants  and  to  the  Western  Union  Tele- 
graph Co.  and  the  Postal  Telegraph  Cable  Co.,  which  have 
branch  offices  in  the  Department  of  Agriculture,  for  trans- 
mission to  the  exchanges  and  to  the  press.  These  com- 
panies have  reserved  their  lines  at  the  designated  time,  and 


COLLECTION   OF  STATISTICAL  DATA  75 

forward  immediately  the  figures  of  most  interest.  A  multi- 
graph  statement,  containing  such  estimates  of  condition 
or  actual  production,  together  with  the  corresponding 
estimates  of  former  years  for  comparative  purposes,  is 
prepared  and  mailed  immediately  to  newspaper  publica- 
tions. 

The  crop  estimates  for  the  State  and  for  the  United  States 
as  a  whole  are  telegraphed  immediately  to  the  Weather 
Bureau  station  director  of  each  State,  in  whose  office  copies 
are  printed  and  mailed  to  all  the  local  papers  in  the  State, 
so  that  the  crop  estimates  of  the  bureau  are  published 
throughout  the  United  States  within  24  hours  of  their 
issuance. 

Promptly  after  the  issuing  of  the  report,  it,  together 
with  other  statistical  information  of  value  to  the  farmer 
and  the  country  at  large,  is  published  in  the  Agricultural 
Outlook,^  a  pubhcation  of  the  Bureau  of  Crop  Estimates, 
under  the  authority  of  the  Secretary  of  Agriculture.  An 
edition  of  over  225,000  copies  is  distributed  to  the  cor- 
respondents and  other  interested  parties  throughout  the 
United  States  each  month. 

Acreage  Estimates 

For  many  years,  in  fact  sin.ce  the  bureau  was  organized 
in  1862,  it  has  been  the  practice  to  accept  the  estimates 
of  acreage  planted  to  different  crops  as  reported  by  the 
Bureau  of  the  Census  every  10  years.^  Then  in  the  first 
year  following  the  census  the  crop  reporters  of  this  bureau 
would  estimate  the  acreage  planted  as  a  percentage  of  the 

^  Supplanted  by  The  Monthly  Crop  Reporter,  January  1,  1918. 

^  Prior  to  1880  the  Census  did  not  show  acreages  of  crops -^  merely  pro- 
duction ;  hence  in  the  earlier  years  the  acreage  basis  was  obtained  by  divid- 
ing the  census  report  of  total  production  by  an  estimated  yield  per  acre. 


76  STATISTICAL   METHODS 

acreage  reported  by  the  census  for  the  preceding  year; 
the  second  year  following  the  census  the  acreage  would  be 
estimated  as  a  percentage  of  the  acreage  estimated  the 
preceding  year,  and  so  on  until  figures  for  the  next  census  are 
available.  Theoretically,  if  there  is  no  bias  or  tendency  to  un- 
derestimate or  overestimate  on  the  part  of  crop  reporters,  the 
acreage  estimates  by  this  method  for  the  tenth  year  after 
the  census  would  agree  with  the  acreage  reported  by  the 
census  for  that  year.  A  weak  point  in  the  system  which 
has  long  been  recognized  is  the  fact  that  individual  crop 
reports  are  not  free  from  bias,  and  there  appears  to  be  a 
fairly  uniform  tendency  to  either  overestimate  or  under- 
estimate the  acreage,  the  result  being  a  cumulative  error 
which  in  10  years  is  apt  to  result  in  a  wide  discrepancy 
between  the  estimates  of  this  bureau  and  the  figures  of  the 
census.  To  illustrate,  if  the  Bureau  of  the  Census  should 
report  10,000,000  acres  planted  to  a  given  crop,  and  there 
should  be  a  uniform  tendency  on  the  part  of  crop  reporters 
of  this  bureau  to  underestimate  the  acreage  of  this  crop  an 
average  of  2  per  cent  annually,  this  bureau  might  estimate 
the  acreage  as  9,800,000  acres  the  first  year  after  the  cen- 
sus, as  9,604,000  acres  the  second  year,  as  9,412,000  acres 
the  third  year,  and  so  on  until  the  tenth  year,  when  the 
bureau's  estimate  for  the  crop  would  be  8,170,000.  If 
during  the  10-year  period  thei'e  had  actually  been  no  change 
in  the  acreage  planted  to  the  particular  crop  in  question, 
and  the  census  should  again  report  an  acreage  of  10,000,000, 
the  result  would  be  a  manifest  discrepancy  of  1,830,000 
acres  between  the  figures  of  this  bureau  and  those  of  the 
census.  Further  discrepancies  would  appear  in  the  yield 
per  acre  and  the  total  }deld. 

At  or  near  the  close  of  harvest  each  year  agents  and 
crop  reporters  of  the  bureau  estimate  the  yield  per  acre, 


COLLECTION   OF  STATISTICAL  DATA  77 

in  bushels,  pounds,  or  tons,  according  to  the  nature  of  the 
product.  The  estimate  of  total  production  is  readily  ob- 
tained by  multiplying  the  yield  per  acre  thus  obtained 
by  the  previously  estimated  total  number  of  acres. 

It  will  be  observed  that  the  method  of  estimating  the 
yield  per  acre  differs  materially  from  the  method  of  esti- 
mating the  total  acreage,  the  acreage  estimate  being  based 
upon  a  percentage  of  the  preceding  year's  acreage,  thus 
carrying  on  from  year  to  year  any  error  made  in  any  pre- 
vious year;  whereas  the  yield-per-acre  estimate,  being  based 
upon  the  one  year  and  not  referring  to  any  former  year, 
is  not  affected  by  any  error  of  a  previous  year.  A  con- 
stant yearly  underestimate  of,  say,  2  per  cent  in  the  acreage 
will  be  magnified  to  a  difference  of  about  10  per  cent  in  5 
years  and  20  per  cent  (approximately)  in  10  years.  A 
constant  yearly  underestimate  of  2  per  cent  in  the  yield 
per  acre  will  not  be  magnified  in  5  or  10  years,  but,  on  the 
other  hand,  in  comparing  one  year's  estimated  yield  with 
another  the  errors  will  be  neutralized ;  that  is,  the  effect 
would  be  the  same,  so  far  as  comparative  value  is  con- 
cerned, as  though  no  error  had  occurred.  In  short,  biased 
errors  in  acreage  estimates  by  percentage  grow  from  year 
to  year;  biased  errors  in  yield-per-acre  estimates  neutralize 
each  other. 

The  Bureau  of  the  Census  enumerates  total  acres  and  total 
production  of  crops ;  if  yield  per  acre  is  wanted  it  is  ob- 
tained by  dividing  the  production  by  the  acres.  The 
Bureau  of  Crop  Estimates  obtains  directly  from  its  agents 
and  correspondents  estimates  of  acreage  (as  described) 
and  yield  per  acre  and  arrives  at  the  total  production  by 
multiplying  acreage  by  yield  per  acre. 

Notwithstanding  the  difference  in  methods  of  procedure, 
the  estimates  of  yield  per  acre  obtained  by  the  Bureau 


78  STATISTICAL   METHODS 

of  Crop  Estimates  in  census  years  and  the  figures  of  yield 
per  acre  obtained  by  the  census,  with  few  exceptions,  do  not 
vary  widely. 

Live-stock  Estimates 

Practically  the  same  difficulty  is  encountered  by  this 
bureau  in  its  estimates  of  the  numbers  of  different  classes 
of  live  stock,  i.e.  the  probable  cumulative  error  resulting 
from  a  uniform  tendency  to  either  underestimate  or  over- 
estimate and  the  consequent  appHcation  of  an  erroneous 
percentage  to  the  census  figure  the  first  year  and  to  an 
erroneous  basis  in  each  succeeding  year  until  the  next  cen- 
sus. A  further  cause  of  divergence  between  the  live-stock 
estimates  of  this  bureau  and  the  figures  of  the  census,  and 
between  any  two  census  years,  results  from  taking  the 
census  or  making  the  estimates  at  different  seasons  of  the 
year.  It  can  readily  be  seen  that  in  the  case  of  sheep  and 
swine  the  estimates  cannot  agree  unless  made  as  of  the 
same  date,  because  of  the  normally  wide  fluctuations  in 
numbers  due  to  natural  increase  during  a  few  months  in 
spring  and  the  large  decrease  due  to  slaughter  in  the  case 
of  swine,  and  also  from  exposure  and  other  causes  in  the 
case  of  sheep  during  the  winter  months. 

While  the  Bureau  of  Crop  Estimates  has  in  recent  years 
taken  cognizance  of  the  tendency  to  bias  on  the  part  of  its 
field  force  and  has  endeavored  to  make  such  allowance 
therefor  as  would  correct  the  errors  involved,  besides  check- 
ing its  estimates  against  the  returns  of  tax  assessors  in 
different  States  and  such  other  reliable  sources  of  infor- 
mation as  are  available,  it  has  felt  the  need  for  a  better 
method  of  estimating  acreages  and  live  stock  between  the 
census  vears. 


COLLECTION   OF  STATISTICAL  DATA  79 

Use  of  Rural  Mail  Carriers 

As  an  experiment,  and  with  the  cooperation  of  the  Post 
Office  Department,  an  attempt  was  made  in  the  winter 
months  of  1913-1914  to  secure  accurate  data  as  to  acreage 
planted  and  numbers  of  Uve  stock  in  the  State  of  Mary- 
land and  15  counties  in  South  CaroHna  by  means  of  short, 
simple  schedules  left  in  mail  boxes  and  collected  by  the 
rural  mail  carriers.  In  theory  this  plan  should  result  in 
complete  returns  as  accurate  as  a  census,  but  in  practice 
it  was  found  that  less  than  40  per  cent  of  the  farmers  would 
fill  out  the  schedules.  The  experiment  demonstrated  that 
satisfactory  results  by  this  method  cannot  be  secured  with- 
out (1)  a  personal  canvass  and  actual  enumeration  by 
the  rural  mail  carriers  similar  to  that  of  the  census  enumera- 
tors ;  (2)  legislation  making  it  compulsory  upon  farmers 
to  supply  the  information  requested ;  or  (3)  a  long  cam- 
paign through  the  press  and  other  agencies  to  educate  the 
farmer  into  the  idea  of  furnishing  information  of  a  statistical 
nature  regarding  their  business,  primarily  for  their  own 
benefit  and  incidentally  for  the  benefit  of  others. 

Typical  Farms  for  Estimating  Acreage  and 

Live  Stock 

The  experiment  in  utiHzing  the  services  of  rural  mail 
carriers  for  making  an  actual  enumeration  of  acreages  and 
of  live  stock  having  proved  inadequate  and  unsatisfactory, 
even  as  a  basis  for  estimating,  it  was  decided  to  establish 
a  selected  Hst  of  typical  farmers  in  each  county  in  the  United 
States  who  will  agree  in  advance  to  cooperate  with  the  de- 
'partment  to  the  extent  of  furnishing  accurate  statements 
of  acreages  and  live  stock  on  their  farms  for  a  series  of  years. 


80  STATISTICAL   METHODS 

These  reports  will  establish  a  basis  for  comparison  with  the 
census  figures  and  will  enable  the  department  to  estimate 
with  a  high  degree  of  accuracy  the  changes  which  take  place 
aimually  between  censuses.  In  future  years  it  will  be  a 
simple  matter  to  apply  the  rate  of  increase  or  decrease  in 
acreages  and  live  stock  which  is  found  to  take  place  on  the 
selected  typical  farms  in  each  county  to  the  total  num- 
ber of  farms  reported  by  the  Bureau  of  the  Census,  and  the 
results  can  be  used  to  check  the  estimates  prepared  on  the 
percentage  basis  under  the  present  system.  A  much  higher 
degree  of  accuracy  will  also  be  possible  with  census  returns 
available  every  5  years,  as  "wdll  be  the  case  hereafter,  in- 
stead of  only  once  in  10  years  as  heretofore. 

The  "Normal"  as  a  Basis  of  Condition  Reports 

Special  consideration  has  been  given  for  many  years  to 
the  so-called  "normal,"  representing  a  condition  or  yield 
of  100  per  cent,  in  terms  of  which  all  the  crop  condition 
estimates  of  this  bureau  are  expressed.  An  objection  to 
the  use  of  this  term  and  what  it  represents,  as  a  basis  for 
crop  reporting,  arises  from  its  apparent  vagueness  and  the 
fact  that  the  \deld  represented  by  it  is  different  for  each 
locality  and  even  for  each  farm,  thus  requiring  explanation 
in  order  to  be  understood.  The  principal  advantage  of  the 
term  "normal"  is  psychological  in  that  it  is  based  on  a 
fundamental  conception  which  is  fairly  uniform  and  clear 
in  the  minds  of  all  practical  farmers,  from  whom  over  99 
per  cent  of  the  crop  condition  reports  of  this  bureau  are 
received. 

But    little    observation    and    experience    is    required    to 
demonstrate  that  the   average   farmer   thinks    of   his  crop* 
as  "crops"  and  not  in  mathematical  terms  of  percentages 


COLLECTION   OF  STATISTICAL   DATA  81 

or  averages,  although  he  can  readily  express  the  estimated 
yield  of  the  crop  in  terms  of  bushels,  pounds,  or  tons.  When 
the  farmer  sows  the  seed  in  spring  he  knows  just  what  the 
field  ought  to  yield,  and  if  the  season  is  favorable,  he  ex- 
pects to  harvest  that  yield.  This  expected  yield  is  a  "full 
crop,"  such  as  he  has  harvested  in  the  past  in  favorable 
seasons.  It  is  neither  a  maximum  possible  or  even  a  bumper 
crop,  which  occurs  only  at  rare  intervals  when  conditions 
are  exceedingly  favorable,  nor  a  medium  or  small  crop 
grown  under  one  or  more  adverse  conditions.  Neither 
i3  it  an  average  crop,  which  rarely  occurs  because  of  the 
effect  on  the  average  of  extremely  low  or  extremely  high 
yields  in  exceptional  seasons.  It  is  rather  the  typical  crop 
represented  by  the  average  of  a  series  of  good  crops,  leav- 
ing out  of  consideration  altogether  the  occasional  bumper 
crop  and  the  more  or  less  frequent  partial  crop  failure. 
This  expected  yield  at  planting  time,  the  full  crop  that  the 
farmer  has  in  mind  when  he  thinks  of  the  yield  he  expects 
to  harvest,  or  the  typical  crop  represented  by  the  average 
of  good  crops  only,  is  the  "normal,"  or  standard  adopted 
by  this  bureau  for  expressing  condition  during  the  grow- 
ing season  and  yield  at  harvest  time. 

The  observation  is  sometimes  made,  as  a  criticism  of 
the  use  of  the  normal,  that  a  normal  crop  is  almost  never 
shown  in  the  reports  of  the  bureau.  A  little  reflection 
will  show  that  a  normal  yield  for  an  entire  State  or  the 
United  States  is  not  to  be  expected  except  on  rare  occa- 
sions. Imagine  the  yields  of  10  different  farmers  in  widely 
scattered  parts  of  the  United  States;  by  definition  of  the 
term  normal  as  a  "full  crop,"  or  expectation  of  yield  at 
planting  time,  an  individual  will  not  secure  a  normal  yield 
every  year,  or  even  every  two  years.  Suppose  each  in- 
dividual secured  a  normal  crop  on  the  average  every  three 


82  STATISTICAL   METHODS 

years,  by  the  law  of  probability  the  chance  of  all  10  farmers 
getting  a  normal  crop  in  the  same  year  is  1  to  30.  If  re- 
turns of  individuals  were  pubhshed,  many  normals  would 
be  shown,  but  the  frequency  would  be  less  in  a  county 
average,  still  less  in  a  State  average,  and  rare  in  a  United 
States  average. 

The  crop  prospect  is  a  subject  of  vital  interest  to  farmers 
and,  like  the  weather,  it  is  a  peremiial  topic  of  discussion 
during  the  crop  season.  Almost  invariably  farmers  speak 
of  the  prospects  as  fine,  good,  fair,  or  poor,  and  they  de- 
scribe the  crop  as  "full  crop,"  "good  crop,"  "average  crop" 
(meaning  less  than  a  full  crop  but  a  little  better  than  the 
real  average),  "three-fourths  of  a  crop,"  or  "one-half  of  a 
crop,"  or  less  infrequently  "75  per  cent  of  a  crop,"  "50 
per  cent  of  a  crop,"  etc.  In  the  South  the  cotton  crop 
prospect  is  usually  spoken  of  in  terms  of  bales,  as  "three- 
fourths  bale  per  acre,"  "one-half  bale  per  acre,"  or  "one- 
third  bale  per  acre."  Few  farmers  think  of  their  crops 
in  terms  of  exact  mathematical  averages  or,  in  fact,  know 
what  the  exact  average  really  is,  because  very  few  of  them 
keep  accurate  records  or  take  the  trouble  to  strike  averages 
from  them.  It  is  equally  true  that  farmers  do  not  generally 
speak  of  crop  conditions  and  crop  prospects  in  terms  of  a 
normal,  but  when  the  farmer  crop  reporters  are  told  that 
the  normal  is  the  same  as  their  conception  of  a  full  crop, 
the  crop  which  their  farms  ought  to  yield  and  are  expected 
to  yield  in  favorable  seasons,  and  that  this  normal  is  repre- 
sented by  100,  they  have  no  difficulty  in  clearly  understand- 
ing what  is  meant  by  the  normal  or  in  expressing  their 
estimates  in  percentages  of  normal. 

Reports  of  crop  condition  expressed  in  percentage  of 
normal  may  indicate  in  a  general  way  the  probable  yield, 
but  as  they  do  not  include  the  variations  in  acreage  it 


COLLECTION   OF  STATISTICAL  DATA  83 

would  be  impracticable  to  forecast  total  production  accu- 
rately from  condition  estimates  alone.  Hence,  to  avoid 
errors  in  the  interpretation  of  condition  estimates  by  those 
who  do  not  have  the  average  figures  before  them,  the  bureau 
converts  the  condition  estimates  into  quantitative  estimates 
of  3deld  per  acre,  wliich,  applied  to  the  estimated  acreage  of 
a  given  crop,  indicate  the  probable  total  production. 

The  question  is  frequently  asked  why  the  crop  esti- 
mates are  not  (1)  based  on  the  average  crop  (presumably 
the  average  for  the  past  5,  10,  or  20  years),  or  (2)  on  the 
crop  of  the  preceding  year,  or  (3)  simply  estimated  for  the 
present  year  in  terms  of  bushels,  pounds,  or  tons. 

The  answer  to  the  first  proposition  is  that  no  ''average 
crop"  can  properly  be  said  to  exist,  or  rather  it  would  not 
correspond  to  any  crop  actually  harvested,  because  the 
average  for  any  given  period  is  unduly  influenced  by  the 
exceptionally  low  or  high  yields  of  abnormal  seasons.  In 
other  words,  the  average  is  a  fluctuating  instead  of  a  fixed 
standard.  Furthermore,  it  would  be  exceedingly  difficult 
to  obtain  satisfactory  estimates  of  crop  prospects  based 
on  average  yields  from  farmer  crop  reporters,  who  con- 
stitute the  bulk  of  the  bureau's  field  force  in  reporting  on 
crop  conditions  during  the  growing  season.  Farmers  as  a 
rule  do  not  keep  a  record  of  average  jdelds  on  their  farms  or 
for  their  communities.  They  do,  of  course,  remember 
abnormally  high  or  low  yields,  but  they  invariably  leave 
such  yields  out  of  consideration  when  estimating  crop 
prospects.  If  the  average  crop,  say,  for  a  period  cover- 
ing the  last  five  years,  were  adopted  as  the  standard,  it 
would  be  necessary  for  the  bureau  to  estimate  the  average 
condition  for  each  month  of  the  growing  season  and  the 
average  yield  for  each  year  in  each  county  and  township 
in  the  United  States  (over  30,000)  for  each  of  the  crops 


84  STATISTICAL   METHODS 

included  in  the  estimates  (50  or  more)  and  to  furnish  each 
crop  reporter  with  the  average  production  of  each  crop 
in  his  territory  for  use  in  making  up  his  monthly  estimates 
during  the  year.  This  would  entail  an  enormous  amount 
of  additional  work,  and  the  average  would  be  unsatisfactory 
because  the  smaller  the  unit  of  territory  the  greater  would 
be  the  fluctuation  in  the  average  or  standard  caused  by  crop 
failures  or  occasional  bumper  yields.  A  single  illustra- 
tion will  suffice  to  make  this  point  clear.  Taking  the  corn 
crop  of  Kansas  as  an  example,  the  average  yield  of  corn 
per  acre  in  the  State  of  Kansas  for  each  of  10  years,  begin- 
ning with  1903,  was  as  follows:  20.9,  27.7,  28.9,  22.1,  22, 
19.9,  19,  14.5,  23,  3.2.  The  average  for  the  10  years  is  20.1 
bushels ;  the  average  for  the  last  five  years  is  15.9  bushels ; 
for  the  preceding  5  years  24.3  bushels.  On  the  other  hand, 
the  idea  of  a  normal  crop,  or  a  full  crop,  was  nearly  con- 
stant, being  31.7  for  the  last  5  years,  31.5  for  the  preceding 
5  years,  and  31.6  for  the  10  years. 

The  answer  to  the  second  proposition,  namely,  a  com- 
parison of  this  year's  crop  with  the  crop  of  the  preceding 
year,  is  that  while  farmers  remember  fairly  well  the  condi- 
tion and  yield  of  crops  for  the  past  year,  they  do  not  re- 
member them  with  sufficient  clearness  or  accuracy  to  be 
able  to  use  them  as  a  standard  of  comparison  for  this  year. 
Furthermore,  the  crops  of  last  year  may  have  been  ab- 
normally high  or  low,  and  would  therefore  make  a  very 
poor  ])asis  of  comparison.  For  instance,  the  yield  of  corn 
per  acre  in  Kansas  was  23  bushels  in  1912,  or  159  per  cent 
of  the  yield  per  acre  in  1911  (14.5  bushels).  The  yield 
in  1913,  an  abnormally  drj^  season,  was  only  3.2  bushels 
per  acre,  which  was  14  per  cent  of  the  yield  in  1912.  If 
the  yield  per  acre  of  corn  in  Kansas  for  1914  should  be  21 
bushels  per  acre,  it  would  be  656  per  cent  of  the  yield  of 


COLLECTION   OF  STATISTICAL   DATA  85 

1913.  It  is  apparent,  therefore,  that  the  abnormally  low 
yield  of  1913  is  a  most  unsatisfactory  basis  of  comparison 
for  the  year  1914. 

The  third  proposition,  namely,  the  estimating  of  crops 
directly  in  terms  of  bushels,  pounds,  or  tons,  is  sometimes 
advanced.  The  objection  to  this  system  is  the  difficulty 
that  most  people  experience  in  estimating  accurately,  until 
near  harvest,  the  number  of  bushels  or  pounds  which  an 
acre  will  yield,  even  though  they  may  be  good  judges  and 
have  the  field  before  them.  Experience  has  demonstrated 
repeatedly  that  it  is  much  easier  to  estimate  proportions 
and  differences  in  comparing  one  period  with  another,  or 
the  production  of  one  year  with  the  production  of  another 
year,  or  condition  and  prospective  yield  with  some  stand- 
ard, such  as  a  normal,  than  it  is  to  estimate  quantitatively 
what  the  condition  or  jdeld  for  a  given  area  actually  is  at 
any  given  time.  Any  one  can  demonstrate  this  principle 
to  his  own  satisfaction  while  looking  at  a  shelf  partly  filled 
with  books  or  a  glass  partly  filled  with  beans.  The  shelf 
or  jar  becomes  in  each  case  the  standard  or  normal  repre- 
sented by  100  per  cent.  He  will  probably  find  that  he  can 
readily  estimate  that  the  shelf  or  jar  is  three-fourths  or  75 
per  cent  full,  and  while  he  may  be  able  to  guess  within  25 
per  cent  of  the  actual  number  of  books,  he  may  overestimate 
the  actual  number  of  beans  in  the  jar  more  than  100  per 
cent.  So  with  cereals  or  other  crops.  It  is  relatively  easy 
for  the  crop  reporter  to  estimate  the  prospects  as  90  per 
cent  of  the  normal  or  other  standard,  but  he  may  have 
difficulty  in  estimating  within  25  per  cent  of  the  actual  pros- 
pects in  terms  of  bushels.  Of  course,  crop  estimates  stated 
simply  as  percentages  of  a  normal  or  other  standard  would 
not  mean  much,  for  which  reason,  wherever  practicable, 
such   estimates   are   converted   into   numerical   statements 


86  STATISTICAL   METHODS 

6y  the  bureau  and  their  equivalents  in  bushels,  pounds, 
or  tons  are  published  in  comparative  statements  showing 
the  figures  for  the  previous  year  and  the  5  or  10  year  average. 
This  whole  subject  of  standards  or  bases  for  crop  reports 
has  been  thoroughly  and  repeatedly  considered,  both  in 
this  country  and  abroad.  On  every  occasion  when  the 
subject  has  been  considered  in  this  bureau  the  normal  has 
seemed  to  possess  more  advantages  and  fewer  disadvan- 
tages than  any  other  standard.  The  Canadian  govern- 
ment has  adopted  as  its  basis  of  crop  estimates  the  prin- 
ciple of  the  10-year  average.  The  10-year  average  has 
also  been  adopted  by  the  International  Institute  of  Agri- 
culture at  Rome,  and  the  institute  is  constantly  urging  its 
adoption  by  the  adhering  countries.  Great  Britain  still 
uses  the  10-year  average  as  the  standard,  which  is  fluctuat- 
ing. Germany  and  a  few  other  European  countries  use  the 
numbers  1  to  5,  inclusive,  to  represent  the  condition  of 
excellent,  good,  fair,  poor,  or  very  poor.  In  France  the 
same  gradations  of  conditions  are  syml^olized  by  80  to  100, 
'60  to  80,  40  to  60,  20  to  40,  and  1  to  20.  The  German  sys- 
tem results  in  confusion  because  in  Germany  the  number  1 
represents  the  highest  condition,  while  in  Sweden  it  repre- 
sents the  lowest  condition ;  besides,  the  terms  excellent, 
good,  fair,  or  poor  are  only  descriptive  and  are  open  to  in- 
terpretations which  interested  speculators  may  desire  to 
place  upon  them. 

Accuracy  of  Condition  Reports 

The  quantitative  interpretation  by  the  Department  of 
Agriculture  of  condition  reports  of  principal  crops,  except 
cotton,  was  begun  in  1911.  A  review  of  these  interpreta- 
tions, or  forecasts,  shows  that  those  made  in  June  varied 


COLLECTION  OF  STATISTICAL  DATA 


87 


an  average  of  11.2  per  cent  from  final  yield  estimates; 
those  in  July  varied  9.6  per  cent;  in  August  6.7  per  cent; 
in  September  4.3  per  cent ;  in  October  3.1  per  cent.  Gen- 
erally forecasts  made  one  and  two  months  before  the  har- 
vest inquiry  are  very  close  to  the  final  estimates  of  yield. 
The  above  percentages  do  not  reflect  the  accuracy  of  the 
work  of  estimating,  but  rather  reflect  the  variableness  of 
conditions  affecting  growing  crops,  which  is  shown  by 
changes  which  take  place  after  the  dates  to  which  the  con- 
dition reports  relate.  The  condition  of  a  corn  crop  on 
August  1  may  be  normal  with  a  forecast  of  35  bushels  per 
acre ;  but  the  crop  may  be  practically  ruined  10  days  later 
by  a  devastating  hot  wind,  and  the  final  yield  be  but  2  or 
3  bushels  per  acre.  The  forecasts  are  such  figures  that, 
based  upon  average  conditions  in  past  years,  there  is  an 
even  chance  or  probabiUty  that  the  final  yield  will  be  either 
above  or  below  the  figure  forecast.  A  variation  of  11.2 
per  cent  from  the  June  forecast  does  not  necessarily  indi- 
cate an  error  of  11.2  per  cent  in  the  forecast,  but  rather  in- 
dicates an  average  subsequent  change  in  condition  of  11.2 
per  cent  before  harvest. 

The  forecasts  made  during  the  past  three  years,  and  final 
estimates  of  yield  are  given  below  : 


Forecast  Made  in  — 

Final 
Estimate 

June 

July 

August 

Septem- 
ber 

October 

Corn  (bushels)  : 

1911 

1912 

1913 

Winter  wheat 
(bushels)  : 

1911 

1912 

1913 

15.3 
14.1 
15.9 

25.5 
26.0 

27.8 

14.6 
13.9 
15.6 

22.6 
26.0 
25.0 

23.6 
27.7 
22.0 

23.8 
27.9 
22.2 

23.9 
29.2 
23.1 

14.8 
15.1 
16.5 

88 


STATISTICAL   METHODS 


Spring  wheat  (bushels) 

19U       .     .     .     .     . 

1912 

1913 

All  wheat  (bushels)  : 

1911 

1912 

1913 

Oats  (bushels) : 

1911 

1912 

1913 

Barley  (bushels) : 

1911 

1912 

1913 

Rye  (bushels)  : 

1911 

1912 

1913 

Flaxseed  (bushels)  : 

1911 

1912 

1913 

Rice  (bushels)  : 

1911 

1912 

1913 

Potatoes  (bushels)  : 

1911 

1912 

1913 

Tobacco  (pounds)  : 

1911 

1912 

1913 

Hav  (tons) : 

1911 

1912 

1913 

Buckwheat  (bushels)  : 

1911 

1912 

1913 


Forecast  Made  in- 


June 


July 


13.7 
13.8 
13.5 

14.7 
14.0 
15.0 

27.7 

29.3 
28.8 

24.9 
25.2 
24.4 

16.1 
16.0 
16.5 


11.8 
14.1 
11.7 

13.5 
14.0 
14.1 

23.2 
30.1 
26.9 

20.9 
25.6 
22.8 

15.5 
16.0 
16.1 

8.6 
9.4 
8.7 

32.2 
31.7 
33.0 

81.7 
95.5 
93.1 

698.1 
844.9 
809.0 

1.08 
1.40 
1.33 


Augast 


10.1 
15.1 
12.5 

12.8 
15.1 
15.0 

23.2 
31.9 
26.8 

19.8 
26.7 
23.1 


7.6 
9.4 
8.3 

32.7 
31.9 
33.1 

71.5 

100.7 

92.0 

672.4 
820.6 
783.0 

1.14 
1.49 
1.33 

18.1 
19.3 
20.1 


Septem- 
ber 


9.8 
15.6 
13.0 

12.6 
15.4 
15.2 

23.9 
34.1 

27.8 

20.3 
27.6 
23.2 


7.7 
9.7 
8.4 

32.1 
32.7 
32.8 

74.2 
108.0 

88.1 

714.6 
817.1 
752.4 


19.6 
21.3 
18.2 


October 


8.1 
9.8 
8.7 

32.0 
33.4 
30.9 

79.7 

108.8 

86.7 

gOl.l 
816.0 
766.0 


19.6 
21.4 
16.5 


Final, 
Estimate 


9.4 
17.2 
13.0 

12.5 
15.9 
15.2 

24.4 
37.4 
29.2 

21.0 

29.7 
23.8 

15.6 
16.8 
16.2 

7.0 
9.8 
7.8 

32.9 
34.7 
31.1 

80.9 

113.4 

90.4 

893.7 
785.5 
784.3 

1.14 
1.47 
1.31 

21.1 
22.9 
17.2 


COLLECTION   OF  STATISTICAL  DATA 


89 


Number  of  Pounds  of  Lint  Cotton  (Net  Weight)  as  Esti- 
mated IN  December,  Annually,  by  the  Department  of 
Agriculture,  and  as  Subsequently  Reported  by  the 
Bureau  of  the  Census,  for  Each  of  the  Seasons  1900- 
1901  TO  1913-1914,  Inclusive,  together  with  the  Percent- 
age Overestimated  or  Underestimated  by  the  Depart- 
ment OF  Agriculture  Each  Season 


Pounds  of  Cotton  (000  omitted) 

Over- 
esti- 
mated 

Per  Cent 

Under- 

Crop  Yeas 

Estimated  by 

Department  of 

Agriculture 

Finally 

Reported  by 

Census  Bureau 

mated 
Per  Cent 

1900-1     

1901-2    

1902-3    

1903-4    

1904-5    

1905-6    

1906-7    

1907-8    

1908-9    

1909-10  

1910-11  

1911-12  

1912-13  

1913-14  

4,856,738 
4,529,954 
5,111,870 
4,889,796 
6,157,064 
4,860,217 
6,001,726 
5,581,968 
6,182,970 
4,826,344 
5,464,597 
7,121,713 
6,612,335 
6,542,850 

78,740,142 

31,879,051 

46,861,091 

4,846,471 
4,550,950 
5,091,641 
4,716,591 
6,426,698 
5,060,200 
6,354,110 
5,312,950 
6,336,070 
4,783,220 
5,551,790 
7,506,430 
6,556,500 
6,772,350 

0.2 

.4 

3.7 

5.1 
.9 

.9 

0.5 

4.2 
4.0 

5.5 

2.4 

1.6 
5.1 

3.4 

Total  1900-1914    . 

79,865,971 
31,307,373 

48,558,598 

1.4 

Years  of  overestimate 
Years     of     underesti- 
mate     

1.8 

3.5 

The  preliminary  estimates  of  the  cotton  crop  in  December 
each  year  are  checked  against  the  monthly  and  annual 
reports  of  production  by  the  Bureau  of  the  Census.  The 
census  reports,  which  are  presumed  to  be  the  most  accurate 
obtainable,  indicate  that  the  Bureau  of  Crop  Estimates 
has  overestimated  the  cotton  crop  6  times  and  under- 
estimated the  crop  8  times  in  the  past  14  years. 


90  STATISTICAL   METHODS 

The  preceding  tabulation  gives  the  annual  estimates 
of  the  Department  of  Agriculture  of  the  production  of  cotton, 
expressed  in  pounds  of  lint,  the  quantity  as  finally  reported 
by  the  Bureau  of  the  Census,  and  the  percentage  of  over- 
estimate or  underestimate  by  the  Department  of  Agri- 
culture. 

As  shown  in  the  tabulation  preceding,  during  the  past  14 
years  the  Department  of  Agriculture  has  overestimated 
the  crop  six  times  and  underestimated  it  eight  times.  In 
years  of  overestimates  the  average  error  was  1.8  per  cent; 
in  those  of  underestimates  the  average  error  was  3.5  per 
cent ;  for  the  entire  14  years  the  average  error  was  2.8  per 
cent.  Balancing  the  overestimates  and  underestimates 
shows,  for  the  entire  period,  a  net  underestimate  of  only 
1.4  per  cent. 

REVIEW 

1.  What  is  there  in  the  first  paragraph  of  the  description  of 
Government  Crop  Reports  which  bears  upon  scientific  method? 

2.  What  interest  has  the  farmer,  the  manufacturer,  the  rail- 
roads, the  salesman  in  crop  reports?  What  interest  have  you  in 
such  reports  ?  Write  out  your  answer  to  the  last  part  of  this  ques- 
tion. 

3.  What  data  are  collected  by  the  Government  and  by  what 
method?  Does  this  meet  the  demands  of  good  statistical  practices? 
In  what  special  particulars  ? 

4.  How  are  the  reports  on  crop  estimates  actually  prepared? 
Why  the  great  caution?  Does  the  caution  seem  warranted  in 
view  of  the  size  of  the  territories  covered,  and  the  number  of  sources 
of  information?  What  bearing,  if  any,  on  the  statistical  side  of 
the  problem  has  the  statement  "  and  the  final  figures  for  each  state 
are  decided  upon"? 

5.  What  is  the  method  of  issuing  the  Crop  Estimates? 

6.  What  method  is  used  by  the  Department  to  estimate  acre- 
age ;  to  estimate  yields?  What  effects  have  biased  errors  on  both? 
How  do  the  census  methods  differ?  What  is  a  test  of  wide  differ- 
ence in  the  two  methods? 


COLLECTION   OF  STATISTICAL  DATA  91 

7.  How  is  the  estimating  of  live  stock  different  from,  and  more 
or  less  difficult  than,  the  estimating  of  acreage? 

8.  What  application  to  schedule  making  has  the  experiment, 
conducted  by  the  Department,  to  secure  actual  acreage  by  rural  mail 
carriers ;  to  the  mandatory  power  of  the  scheduling  agent ;  to  the 
type  of  informant? 

9.  What  relation  to  sampling  as  a  statistical  device  has  the 
principle  of  choosing  typical  farms  for  estimating  acreage  and  live 
stock?  to  error?  How  large  a  sample  is  necessary?  What  condi- 
tions must  it  cover? 

10.  "Practical  farmers"  .  .  .  furnish  "over  99  per  cent  of  the 
crop  condition  reports."  Will  such  people  understand  what  is 
meant  by  a  "normal  crop"?  Why?  Is  this  an  acceptable  unit? 
Is  such  a  unit  likely  to  be  better  understood  than  the  unit  "  good 
crop,"  "full  crop,"  "  three-fourths  of  a  crop"?  Why  not  use  the 
expression  "average  crop"?  Why  not  compare  crop  condition 
on  the  basis  of  the  previous  year?  Why  not  estimate  it  in  terms 
of  bushels,  pounds,  tons? 

11.  How  do  you  interpret  the  figures  showing  the  degree  of 
accuracy  of  estimates  to  realized  crop?  How  is  this  subject  re- 
lated, if  at  all,  to  the  compensation  of  errors? 


SAMPLING  AS  AN  ALTERNATIVE  TO  A  COUNT  i 
Nature  of  Timber  Estimates 

The  determination  of  the  amount  of  standing  timber 
on  a  given  area  is  a  matter  of  far  greater  difficulty  than  is 
hkely  to  be  assumed  by  persons  who  have  not  been  con- 
cerned with  the  question.  To  show  what  the  difficulties 
are,  the  methods  of  measuring  and  estimating  timber  must 
be  set  forth  in  some  detail. 

Measurements  of  Lumber  and  Logs.  —  Measurements  of 
lumber  and  timber  in  the  United  States  are  commonly 
made  in  terms  of  board  feet.     While  12  board  feet  make  1 

1  Adapted  with  permission  from  "The  Lumber  Industry  Pt.  I,  Standing 
Timber,"  United  States  Bureau  of  Corporations,  January  20,  1913,  pp.  45-58. 


92  STATISTICAL   METHODS 

cubic  foot,  a  tree  which  contains  200  cubic  feet  of  wood  will 
make  only  a  small  fraction  of  2400  board  feet  of  lumber. 
A  large  part  of  the  wood  —  all  the  branches  and  the  upper 
part  of  the  trunk  —  is  not  suitable  for  lumber,  and  there 
is  always  some  loss  in  the  stump.  But  the  lumber  pro- 
duced is  far  less  than  12  board  feet  for  every  cubic  foot 
of  logs  suitable  for  sawing.  The  slabs,  removed  in  squar- 
ing the  log,  are  wholly  or  largely  wasted ;  the  sawdust 
is  wasted ;  there  is  a  waste  because  of  the  difficulty  of  saw- 
ing true ;  and  there  may  be  further  losses  on  account  of  in- 
ternal defects  in  the  log.  Each  of  these  losses  varies  widely. 
The  slabs  are  a  larger  proportion  of  a  small  than  of  a  large 
log,  and  a  much  larger  proportion  of  a  crooked  log  than  of  a 
straight  one.  The  waste  from  defects  depends  upon  the 
quality  of  the  timber,  and  also  on  the  size  of  the  pieces 
sawed  out ;  for  a  defect  which  is  hidden  in  a  heavy  timber 
or  even  in  a  3-inch  plank  may  come  to  light  in  a  board. 
The  waste  from  the  difficulty  of  accurate  sawing  varies  with 
the  wood,  with  the  character  of  the  mill,  and  with  the  skill 
of  the  sawyer.  The  waste  from  sawdust  varies  with  the 
thickness  of  the  saw  and  with  the  size  of  the  lumber  made. 
Some  large  circular  saws  take  out  a  kerf  three-eighths  of  an 
inch  wide,  or  even  more.  Smaller  ones  may  take  one- 
fourth  of  an  inch.  Many  band  saws  and  gang  saws  work 
on  one-eighth  or  little  more.  A  few  are  said  to  cut  as  little 
as  one-sixteenth. 

With  a  saw  that  takes  out  a  quarter-inch  kerf,  a  thick- 
ness of  an  inch  and  a  quarter  is  required  for  getting  out  a 
1-inch  board  ;  one-fifth  is  lost  in  sawdust.  If  2-inch  planks 
are  sawed,  the  waste  is  only  one-ninth ;  if  timbers,  say  12 
inches  square,  the  kerf  is  unimportant. 

The  contents  of  logs  are  reckoned  by  lumbermen  in  board 
feet.     For  this  purpose,  however,  the  contents^  are  not  the 


COLLECTION   OF  STATISTICAL   DATA  93 

full  volume,  but  the  quantity  of  lumber  that  a  log  may  be 
expected  to  make.  As  has  been  shown  in  the  preceding 
paragraphs,  the  product  depends  on  many  things  besides 
the  length  and  diameter  of  a  log.  At  different  mills,  and 
under  different  circumstances,  the  product  of  exactly  simi- 
lar logs  may  vary  materially ;  and  at  the  same  mill,  and 
under  the  same  circumstances,  one  log  may  produce  con- 
siderably more  than  another  whose  gross  cubic  contents 
are  the  same.  The  measurement  or  "scahng"  of  logs, 
therefore,  is  not  a  mathematically  accurate  determination 
of  their  volume,  but  an  approximate  determination  of  the 
quantity  of  lumber  they  are  likely  to  yield. 

For  this  purpose,  lumbermen  commonly  use  a  measure 
called  a  log  scale  or  scale  stick.  This  is  a  flat  stick,  a  quarter 
of  an  inch  or  more  in  thickness  and  about  an  inch  and  a 
quarter  broad.  The  edges  are  often  graduated  in  inches. 
On  the  faces  are  usually  six  graduations,  three  on  one  and 
three  on  the  other,  for  six  lengths  of  logs.  These  gradua- 
tions run  lengthwise  of  the  stick,  and  show  the  contents 
in  board  feet,  at  each  diameter,  for  logs  of  each  length. 
The  length  of  a  given  log  is  first  determined,  usually  by  the 
eye;  the  stick  is  then  laid  across  the  small  end,  and  the 
contents  in  board  feet  are  read  off.  The  reading  is  supposed 
to  give  the  contents  of  a  straight,  sound  log ;  and  if  a  log 
is  crooked  or  unsound,  the  scaler  makes  a  deduction  ac- 
cording to  his  judgment.  The  measuring  sticks  are  grad- 
uated according  to  tables,  called  log  scales  or  log  rules, 
which  give  the  supposed  product  of  logs  of  different  diameters 
and  lengths.  Many  such  tables  have  been  constructed, 
some  from  diagrams,  some  by  mathematical  formulae,  some 
by  measurement  of  logs  sawed  and  their  product,  and  some 
by  combinations  of  these  methods.  The  Woodman\'i  Hand- 
book, pubUshed  as  Bulletin  36  of  the  Forest  Service,  gives 


94  STATISTICAL   METHODS 

44  different  rules.  The  differences  among  them  are  as- 
tonishingly wide.  For  a  16-foot  log,  24  inches  in  diameter, 
the  computed  contents  range  from  268  board  feet  to  500; 
for  a  12-foot  log,  6  inches  in  diameter,  from  3  board  feet 
to  20. 

For  a  log  12  feet  long  and  6  inches  in  diameter,  most 
rules  give  values  ranging  from  12  feet  to  20.  Yet  the 
Doyle  rule,  which  gives  only  3  feet,  is  more  widely  used 
than  any  other.  It  is  far  more  inaccurate  for  small  logs 
than  for  large,  yet  in  great  areas  of  the  country  it  is  used 
for  small  logs  only.  There  is  another  rule  of  long  and  wide 
acceptance,  the  Scribner,  which  gives  smaller  values  than 
the  Doyle  for  the  larger  diameters  and  much  larger  values 
for  the  smaller  diameters.  A  combination  of  the  two  has 
been  made,  by  taking  the  smaller  value  for  each  size  of  logs, 
with  very  few  exceptions.  This  combination,  called  the 
Doyle-and-Scribner  rule,  is  the  scale  chiefly  used  in  many 
parts  of  the  Eastern,  Southern,  and  Middle  Western  States. 
Mills  which  use  the  Doyle  or  the  Doyle-and-Scribner  rule, 
and  which  cut  small  logs,  often  have  an  "overrun"  of  20, 
30,  sometimes,  with  thin  saws,  of  40  or  50  per  cent ;  that 
is,  their  actual  product  of  lumber  exceeds  by  so  much  the 
scale  of  the  logs  they  saw. 

Usually  the  timber  owned  by  a  sawmill  will  give  quite 
uniform  results  when  handled  under  the  same  conditions. 
Defects  are  characteristic,  not  only  of  the  species  but  also 
of  the  district  where  the  trees  grow,  and  by  keeping  records 
comparing  the  actual  yield  with  the  scale  of  the  logs  it  is 
possible  to  determine  the  approximate  relation  between 
the  two.  The  mill  may  thus  compute  the  average  overrun 
shown  by  its  experience,  and  then  reckon  that  its  logs  will 
in  all  hkelihood  yield  approximately  the  same  percentage 
above  the  scale. 


COLLECTION   OF  STATISTICAL   DATA  95 

Estimating  Standing  Timber.  —  It  has  perhaps  been  made 
clear  enough  that  many  uncertainties  are  involved  in  the 
scaling  of  logs.  Even  aside  from  the  element  of  individual 
judgment,  in  allowing  for  defects,  the  mere  application 
of  the  rules  to  straight  and  sound  logs  gives  results  which 
only  approximate  the  product  of  the  saw. 

The  estimating  of  standing  timber  introduces  further 
difficulties.  The  ideal  of  accuracy,  from  the  standpoint 
of  the  "cruiser"  making  the  estimate,  would  be  to  reach 
the  same  result  that  would  be  reached,  after  the  trees  were 
felled,  by  the  scaling  of  the  logs.  As  just  shown,  this  ideal 
falls  far  short  of  an  accurate  measure  of  the  resultant 
lumber ;  but  this  very  imperfect  ideal  is  not  approached 
in  most  estimates  of  standing  timber.  It  can  be  approached 
by  detailed  calculation.  Every  merchantable  tree  can  be 
counted,  its  diameter  measured,  and  even  its  height.  There 
may  still  be  shrinkages  between  tree  and  log  that  cannot 
be  determined  beforehand.  There  may  be  concealed  hol- 
lows ;  in  some  species,  as  cypress,  there  will  be  many. 
There  may  be  much  breakage  in  felling;  this  is  a  heavy 
loss  in  redwood.  But,  waiving  such  points,  counting  and 
measuring  are  enormously  expensive,  and  such  a  method 
is  hardly  ever  used  in  practice.  Even  if  the  trees  are 
counted,  the  average  diameter  is  usually  estimated  by  the 
eye,  and  the  supposed  normal  content  of  the  tree  of  this 
diameter  is  multiplied  by  the  number  of  trees.  This  nor- 
mal content  is  based  on  the  estimator's  experience  or  on 
volume  tables.  Even  the  counting  of  trees  is  not  only  slow 
and  expensive,  but  difficult.  It  is  hard  to  be  sure  of  getting 
them  all  and  counting  none  twice. 

Oftener  no  attempt  is  made  to  count  every  tree,  but 
sample  plots,  perhaps  of  an  acre  each,  are  laid  off  by  pacing 
or  with  a  surveyor's   chain,   and   the   trees  on   them   are 


96  STATISTICAL   METHODS 

counted.  The  result  is  taken  as  the  average  stand  on  the 
larger  area  which  the  samples  represent. 

Far  the  commonest  method  of  estimating,  however,  is 
simply  to  look  the  forest  over,  without  any  counting  or 
measuring.  The  examination  may  be  made  with  less  or 
greater  care.  The  cruiser  may  tramp  back  and  forth  on 
parallel  paths  only  a  few  rods  apart,  or  he  may  make  only 
one  trip  through  a  strip  a  mile  wide.  He  may  tramp  all 
day  without  making  a  note,  and  set  down  at  night  his  esti- 
mate of  the  area  he  has  covered  and  of  the  whole  amount 
of  timber  he  has  passed  through. 

By  long  experience,  men  learn  to  form  judgments  by 
these  rough  methods,  wiiich,  on  an  average,  approximate 
fairly  the  scale  of  the  logs.  The  general  tendency  is  to 
estimate  below  the  truth,  because  the  estimator  desires  to  be 
"safe";  that  is,  not  to  have  his  estimate  subsequently 
proved  too  large  by  other  cruisers  or  by  the  results  at  the 
mill.  To  overestimate  reflects  on  the  cruiser.  The  owner 
will  not  complain  if  the  cut  shows  more  timber  than  the 
estimate,  but  he  will  be  displeased  —  especially  if  he  bought 
on  the  estimate  —  if  the  cut  shows  less. 

Moreover,  an  estimate  which  is  accurate  according  to 
the  customs  of  one  time  will  be  inaccurate  according  to 
those  of  another,  because  the  standards  of  merchantable 
timber  change.  With  higher  prices  for  lumber,  more  logs 
are  brought  to  the  mill  from  the  same  tract  and  more  board 
feet  of  lumber  are  made  out  of  the  same  log,  because  the 
manufacturer  is  able  to  sell  some  low-grade  lumber  not 
previously  marketable.  Again,  some  species  formerly  re- 
garded as  worthless  and  not  included  in  estimates  become 
valuable  with  higher  prices  and  increase  the  estimates  of 
merchantable  timber  by  their  amount.  This  has  been  true 
of  every  timber  region  in  the  past,  and  as  values  rise  and 


COLLECTION  OF  STATISTICAL  DATA  97 

timber  is  cut  closer  in  the  future,  estimates  will  rise  far 
above  those  which  are  used  to-day. 

If  two  estimates  of  the  same  tract,  made  at  the  same 
time,  do  not  differ  more  than  10  per  cent,  they  agree  quite 
as  closely  as  can  be  expected.  Good  estimators  often 
differ  25  per  cent,  and  sometimes  even  50  per  cent.  An 
important  tract  of  pine  in  northern  Minnesota  was  exam- 
ined by  three  companies  in  1909,  with  a  view  to  pur- 
chase. One  estimated  it  at  125,000,000  feet,  and  another  at 
135,000,000.  The  seller's  estimate  was  170,000,000,  and  on 
this  basis  the  third  company  bought  it.  The  purchase  was 
made,  however,  against  the  opposition  of  a  member  of  the 
buying  company,  who  is  reputed  to  be  one  of  the  best  timber- 
men  in  Minnesota,  and  who  estimated  the  tract  at  from 
95,000,000  to  110,000,000.  The  accepted  estimate  exceeded 
his  by  more  than  50  per  cent,  and  if  the  mean  of  his  figures 
be  taken  as  representing  his  opinion  the  independent  estimates 
of  other  prospective  buyers  exceeded  his  by  20  or  30  per  cent. 

The  following  table  shows  the  average  results,  by  years, 
of  two  series  of  estimates  —  first,  those  made  by  a  company 
in  the  North  Carolina  pine  region  for  purposes  of  purchase  ; 
second,  those  made  by  the  State  of  Minnesota  on  timber 
owned  by  the  State  for  purposes  of  sale.  The  quantities 
given  as  cut  represent  the  scale  of  the  logs ;  the  quantity 
of  lumber  actually  sawed  was  materially  greater. 

The  southern  company  usually  paid  a  lump  sum  for  a 
tract,  and  the  prices  it  offered  were  fixed  on  the  basis  of  its 
estimates.  It  would  try  to  get  a  fair  idea  of  the  timber 
it  was  buying,  but  would  wish  to  err  rather  on  the  con- 
servative than  on  the  liberal  side.  The  State  of  Minnesota 
did  not  sell  its  timber  at  so  much  for  a  tract,  but  at  so  much 
a  thousand,  and  the  payments  were  determined  by  the 
quantity  of  logs  scaled. 

H 


98 


STATISTICAL   METHODS 


Estimated  Amounts  of  Timber  on  Certain  Tracts,  Classi- 
fied BY  Year  of  Purchase,  with  the  Amounts  Cut 
Therefrom 


Timber  Bought  by  a 

Southern 

Timber  Sold  by  the 

State  of 

COMPANT 

Minnesota 

Year 

Cut,  Per 

Cut,  Per 

Estimated 

Cut 

Cent  of 
Estimate 

Estimated 

Cut 

Cent  of 
Estimate 

M  feet 

M  feet 

M  feet 

M  feet 

1886 

24,540 

37,859 

154.3 

1887 

40,472 

34,021 

84.1 

1888 

20,400 

27,488 

134.7 

1889 

33,040 

49,952 

151.2 

1890 

52,130 

63,681 

122.2 

1891 

. 

78,710 

176,784 

224.6 

1892 

29,135 

72,680 

249.5 

1893 

23,795 

36,791 

154.6 

1894 

33,870 

42,856 

126.5 

1895 

2,550 

3,774 

148.0 

27,403 

41,010 

149.7 

1896 

460 

1,249 

271.5 

2,600 

1,758 

67.6 

1897 

41,075 

53,508 

130.3 

51,322 

68,598 

133.7 

1898 

25,648 

31,718 

123.7 

30,643 

42,688 

139.3 

1899 

29,355 

37,198 

126.7 

4,035 

3,484 

86.3 

1900 

4,575 

5.997 

131.1 

69,128 

71,958 

104.1 

1901 

4,485 

5,539 

123.5 

25,400 

29,565 

116.4 

1902 

4,550 

5,487 

120.6 

52,710 

53,922 

102.3 

1903 

3,505 

3,525 

100.6 

70,875 

82,045 

115.8 

1904 

24,930 

23,534 

94.4 

32,900 

36,718 

111.6 

1905 

10,142 

9,556 

94.2 

68,078 

105,970 

155.7 

1906 

10,085 

9,917 

98.3 

26,705 

47,227 

176.8 

1907 

590 

824 

139.7 

22,795 

27,105 

118.9 

1908 

1,200 

1,353 

112.8 

(1) 

1909 

2,985 

3,426 

114.8 

2,165 

5.541 

255.9 

Total 

166,141 

196,605 

118.3 

822,851 

1,159,701 

140.9 

Some  of  the  earlier  purchases  of  the  southern  company- 
stood  several  years  between  bu3dng  and  cutting,  and  if  the 
timber  was  immature  the  quantity  may  have  increased 
somewhat  by  growth.  This  element  is  believed  to  have 
been  of  minor  importance,  however,  and  it  does  not  enter 


No  Sales. 


COLLECTION   OF  STATISTICAL  DATA 


99 


in  the  case  of  the  Minnesota  timber.  That  was  usually 
cut  within  two  or  three  years  after  the  estimate  was  made ; 
and  in  any  case  the  timber  was  mature,  and  the  decay  of  the 
old  trees  probably  balanced  the  growth  of  the  young. 

Under  these  circumstances,  the  scale  of  the  logs  from  the 
Minnesota  timber,  taking  all  the  sales  of  each  year  together, 
was  usually  from  10  to  60  per  cent  above  the  estimate,  with 
an  average  of  40  for  the  whole.  The  sales  of  1896  cut  only 
two-thirds  of  the  estimate ;  those  of  1892  and  1909  cut 
2^  times  the  estimate. 

In  the  case  of  the  southern  company,  reckoning  its  pur- 
chases by  annual  aggregates,  the  purchases  of  most  years, 
so  far  as  they  have  been  cut,  have  produced  logs  exceed- 
ing the  estimates  by  from  10  to  40  per  cent,  with  an  average 
of  18  for  the  whole.  Three  years  show  a  shortage  of  from 
2  to  6  per  cent,  and  one  rather  small  lot  went  above  2^ 
times  the  estimate. 

The  variation  is  greater  on  particular  tracts  than  on 
yearly  aggregates.  The  following  table  shows  the  estimates 
and  the  scale  of  the  logs,  in  detail,  for  the  several  tracts 
bought  by  the  southern  company  in  1909  : 

Estimated  Amounts  of  Timber  on  Certain  Tracts  Bought 
BY  A  Southern  Company  in  1909,  and  the  Amounts  Cut 
Therefrom 


Estimated 

Cut 

Cut,  Per 
Cent  of 
Estimate 

Estimated 

Cut 

Cut,  Per 
Cent  of 
Estimate 

M  feet 

M  feet 

M  feet 

M  feet 

75 

80 

106.7 

675 

443 

65.6 

40 

35 

87.5 

200 

161 

80.5 

20 

15 

75.0 

425 

500 

117.6 

550 

1,059 

192.5 

1,000 

1,133 

113.3 

2,985 

3,426 

114.8 

100 


STATISTICAL   METHODS 


On  the  whole  year's  purchases  the  scale  of  the  logs  varied 
only  15  per  cent  from  the  estimates  ;  but  on  particular  tracts 
the  result  ranged  from  34  per  cent  below  the  estimate  to 
92  per  cent  above. 

The  following  table,  except  the  percentages,  is  taken 
from  the  report  of  the  Commissioner  of  the  General  Land 
Office  for  1910,  page  15.  It  gives  the  results  of  logging 
on  ceded  Chippewa  lands  in  Minnesota,  grouping  the  tracts 
according  to  date  of  sale.  Payment  is  based  on  the  amount 
actually  cut. 


Estimated  Amounts  of  Timber  on  Certain  Ceded  Chippewa 
Lands  in  Minnesota,  Grouped  by  Date  of  Timber  Sale, 
WITH  THE  Amounts  Cut  Therefrom 


Date  of  Sale 

govebnment 
Estimate 

Cut 

Cut,  Per 
Cent  op 
Estimate 

March  2,  1903  .     . 
December  5,  1903 . 
December  28,  1903 
November  15,  1904 
November  17,  1904 
July  17,1907      .     . 
March  15,  1910      . 

M  feet 

13,636 

223,921 

169,308 

146,560 

9,718 

2,056 

2,169 

567,368 

M  feet 
26,816 

308,637 

296,155 

168,113 

18,786 

3,754 

2,189 

824,450 

196.7 
137.8 
174.9 
114.7 
193.3 
182.6 
100.9 

Total     .     .     . 

145.3 

On  the  whole  quantity  the  log  scale  exceeded  the  esti- 
mate by  45  per  cent.  On  the  tracts  sold  November  17, 
1904,  and  on  those  sold  March  2,  1903,  the  log  scale  was 
nearly  double  the  estimate. 

Professional  cruisers  keep  as  well  informed  as  possible 
on  the  relation  between  their  estimates  and  the  results 
shown  in  cutting  the  timber,  and  thus  modify  their  judg- 


COLLECTION   OF  STATISTICAL  DATA  101 

ment  with  experience.  This  is  especially  true  in  the  first 
years  of  their  work  as  cruisers,  or  when  going  from  one 
timber  region  to  another  of  very  different  character,  or 
during  periods  of  marked  change  in  the  standards  of  mer- 
chantable timber.  On  first  going  from  the  Lake  States 
to  the  Pacific  coast,  cruisers  made  estimates  far  below  the 
truth,  because  the  stands  per  acre  were  so  enormous  that 
men  accustomed  to  eastern  stands  could  not  grasp  or  accept 
them.  It  is  only  during  recent  years  that  estimates  for 
western  timber  have  been  made  close  to  the  actual  yield. 

Methods  Followed  in  the  Investigation 

In  the  preceding  section,  the  effort  has  been  made  to  show 
how  far  from  exactness  is  the  art  of  estimating  timber. 
Even  when  the  estimates  are  made  with  what  is  consid- 
ered reasonable  care,  for  the  purpose  of  purchase  or  sale, 
they  are  uncertain.  In  naming  offhand  the  probable 
contents  of  a  tract  which  he  has  never  carefully  examined 
but  has  only  a  general  knowledge  of,  a  man  will  of  course 
do  worse,  on  an  average,  than  in  giving  an  estimate  on  a 
tract  which  he  has  just  examined  for  the  purpose.  The 
most  experienced  lumberman  can  know  but  a  comparatively 
small  area  by  careful  examination.  When  he  undertakes 
to  make  a  general  estimate  for  a  district,  even  for  a  few 
townships,  he  must  usually  depend  partly  on  general  ob- 
servation and  partly  on  the  opinions  of  others. 

Most  individuals  and  corporations  owning  important 
tracts  have  had  fairly  good  estimates  of  their  timber  made, 
either  recently  or  in  earlier  years,  and  in  the  latter  case  they 
usually  have  a  fairly  definite  opinion,  based  on  the  results 
of  cutting  or  on  general  information,  regarding  the  per  cent 
by  which  the  old  estimate  should  be  increased  to  make  it 


102  STATISTICAL   METHODS 

approach  present-day  standards  of  merchantable  timber. 
The  owners  of  timber,  cruisers,  loggers,  timber  dealers,  and 
the  responsible  employees  of  timber  and  lumber  companies 
are  often  well  acquainted  with  the  approximate  amount 
of  timber  in  holdings  other  than  those  in  which  they  are 
directly  interested,  and  also  well  informed  regarding  the 
probable  total  amount  of  timber  in  certain  survey  town- 
ships or  other  subdivisions  of  a  county,  or  in  the  county 
as  a  whole.  Thus,  there  exists  in  the  records  of  timber 
owners  and  in  the  minds  of  men  a  basis  for  arriving  at  the 
approximate  amount  of  timber  in  a  State  and  in  the  coun- 
try. The  accuracy  of  the  results  which  may  be  obtained 
from  these  sources  depends  largely  on  the  willingness  and 
truthfulness  with  which  the  informants  give  the  informa- 
tion they  possess,  and  on  the  perfection  of  the  methods 
by  which  this  information  is  gathered  in  detail  by  small 
areas  and  is  studied. 

The  only  better  method  would  be  a  careful  examina- 
tion of  the  timbered  area  by  public  officers.  The  result 
would  still  be  a  collection  of  opinions,  not  of  mathematical 
determinations ;  but  the  opinions  would  have  more  value, 
other  things  being  equal,  in  proportion  as  they  were  based 
on  more  careful  and  detailed  examination  of  the  timber. 
By  the  expenditure  of  time  and  money,  they  might  be 
raised  to  any  degree  of  accuracy,  up  to  the  point  where 
they  should  represent  a  count  and  measurement  of  every 
tree. 

A  count  and  measurement  of  all  merchantable  trees, 
however,  or  even  a  count  without  measurement,  would, 
of  course,  not  be  thought  of.  Such  work  is  so  expensive 
that  most  timber  is  bought  and  sold  without  it;  and  a 
procedure  which  men  cannot  afford  to  use  for  their  guid- 
ance in  buying  and  selUng  is  far  too  expensive  for  any  sta- 


COLLECTION   OF  STATISTICAL   DATA  103 

tistical  inquiry.  The  only  proposal  which  could  be  thought 
of  would  be  an  estimate  by  general  observation,  perhaps 
supplemented,  where  the  forest  was  practically  unbroken, 
with  a  count  on  sample  tracts.  The  cost  of  such  an  esti- 
mate would  vary  with  the  minuteness  of  it,  but  the  rough- 
est canvass  that  would  be  worth  making  would  be  a  matter 
of  some  millions  of  money  and  some  years  of  time.  Even 
if  money  were  unlimited,  it  is  not  likely  the  work  could 
be  tolerably  well  done  in  ten  years  for  lack  of  men.  The 
estimating  of  timber  is  an  art  acquired  by  much  practice. 
The  men  skilled  in  it  are  few,  and  they  are  employed  in  cur- 
rent business.  They  could  hardly  be  diverted  in  the  nec- 
essary numbers  to  an  official  investigation.  Furthermore, 
such  a  plan  would  give  information  on  the  total  amount  of 
timber  only  and  nothing  regarding  the  ownership  of  it. 
To  provide  such  data,  it  would  be  necessary  to  first  obtain 
records  of  the  ownership,  and  then  to  make  the  observa- 
tions separately  for  each  holding,  which  would  greatly 
increase  the  expense  and  the  time. 

Methods  Adopted.  —  The  problem  before  the  Bureau 
of  Corporations  was  to  provide  a  plan  which  would  give, 
within  reasonable  expense  and  time,  as  accurate  infor- 
mation as  the  nature  of  the  problem  allowed  regarding  all 
large  holdings  separately,  and  regarding  the  scattered  small 
holdings  as  a  whole,  in  order  to  determine  the  proportion 
between  the  timber  owned  in  holdings  of  certain  specified 
sizes  and  the  total  timber  in  the  country.  Under  the  plan 
adopted,  the  investigation  of  the  amount  of  timber  in  all 
small  holdings  proceeded  side  by  side  with  the  investiga- 
tion of  the  essential  facts  regarding  large  holdings,  in 
such  a  way  that  the  latter  checked  and  contributed  to  the 
former. 

The  work  was  guided  by  the  following  principles : 


104  STATISTICAL   METHODS 

1.  The  available  resources  would  not  permit  the  em- 
ployment of  estimators  with  a  view  to  the  examination  of 
timber.  Any  estimate  must  therefore  be  based  on  data 
already  existing,  in  records  or  in  the  minds  of  men. 

2.  The  estimate  of  the  timber  on  each  area  should  be 
derived  from  the  records  or  the  opinions  of  those  most  fa- 
miliar with  it,  and  as  many  records  and  opinions  as  pos- 
sible should  be  obtained  regarding  it,  in  order  to  give  a 
constant  check  on  the  work  and  to  enable  the  Bureau  to 
arrive  at  the  best  estimate  from  a  thorough  study  of  all 
the  available  evidence  in  detail. 

3.  A  separate  report  should  be  made  for  each  holding 
of  60  million  board  feet  or  more.  Information  regarding 
each  such  holding  should  be  obtained  from  as  many 
sources  as  possible. 

4.  For  the  total  timber  in  holdings  of  less  than  60  mil- 
lion board  feet,  the  best  local  evidence  must  be  rehed  on. 
Estimates  should  be  obtained  for  the  smallest  possible 
units  of  area,  and  the  opinions  of  each  authority  should 
have  special  weight  for  the  neighborhood  which  he  knows 
best.  .  .  . 

A  few  reports  were  obtained  by  mail,  but  for  nearly  all 
owners  the  schedule  was  filled  by  special  agents  of  the  Bureau 
visiting  the  informants.  With  regard  to  the  amount  of 
timber,  the  essential  items  are  these :  The  number  of  acres, 
the  exact  location  of  the  land,  and  detailed  estimates 
.  .  .  which  would  enable  the  Bureau  to  judge  the  accu- 
racy of  the  estimate.  All  States  of  the  investigation  area 
except  Virginia,  North  Carolina,  South  Carolina,  Georgia, 
and  Texas  are  surveyed  under  the  rectangular-survey  sys- 
tem, and  there  it  was  possible  to  show  the  exact  location 
of  the  timber  holdings.  ...  In  Virginia,  North  Carolina, 
South  Carolina,   Georgia,   and  Texas,  maps  or  blueprints 


COLLECTION   OF  STATISTICAL   DATA  105 

showing  the  exact  location  of  the  land  were  obtained  wherever 
available,  and  other  holdings  were  located  descriptively 
as  accurately  as  possible  by  political  subdivisions  of  the 
county  and  by  the  relations  of  the  holdings  to  towns,  rail- 
roads, streams,  etc.  The  largest  holders  in  these  States 
usually  have  maps  showing  the  exact  location  of  their  lands. 
In  the  rectangular-survey  States,  the  agents  did  not  secure 
the  exact  location  in  every  case,  and  a  relatively  few  hold- 
ings were  located  only  by  counties  or  as  in  certain  survey 
townships ;  but  in  nearly  all  cases  the  exact  location  was 
obtained.  .  .  . 

Field  work  was  begun  by  sending  agents  into  the  lumber 
centers,  which  are  the  headquarters  of  many  of  the  largest 
owners  of  standing  timber.  The  reports  from  them  were 
tabulated  by  counties,  showing  for  each  holder  in  the  county 
the  number  of  acres,  amount  of  timber,  and  stand  per  acre, 
and  the  land  was  platted  on  county  maps  with  a  different 
symbol  for  each  holding.  In  the  five  States  without  the 
rectangular  survey,  the  location  of  holdings  could  be  shown 
exactly  wherever  blueprints  or  maps  had  been  furnished, 
and  in  other  cases  only  descriptively.  With  these  records 
of  information  already  obtained,  an  agent  of  the  Bureau 
was  sent  into  every  one  of  about  900  counties  in  the  in- 
vestigation area.  His  instructions  were  to  seek  out  every 
reliable  local  informant  and  secure  all  available  informa- 
tion that  would  verify  or  correct  the  reports  already  ob- 
tained, to  secure  a  separate  report  on  each  remaining  holder 
in  the  county  who  had  as  much  as  60  million  feet  in  the 
United  States,  and  to  secure  data  in  as  much  detail  and 
from  as  many  different  sources  as  possible  regarding  the 
total  timbered  acreage  and  the  amount  of  timber  in  all 
holdings  not  separately  reported,  including  the  small  scat- 
tered tracts  sometimes  referred  to  as  "farmers'  woodlots." 


106  STATISTICAL   METHODS 

By  adding  the  holders  separately  reported,  as  he  obtained 
them,  to  the  county  map  above  mentioned,  the  agent  was 
able  to  proceed  systematically  in  obtaining  data  regarding 
all  land  within  the  timber  line  of  the  county.  For  many 
of  the  counties  in  the  Southern  Pine  Region  and  in  the  Lake 
States  it  was  not  practicable  to  obtain  these  estimates  on 
the  timber  in  the  county  or  subdivisions  of  it,  such  as  sur- 
vey or  poHtical  townsliips,  exclusive  of  the  reported  hold- 
ings of  at  least  60  milhon  feet.  In  such  counties  it  there- 
fore became  necessary  to  secure  the  estimates  on  the  total 
timber  in  the  county  or  a  subdivision  of  it,  and  then  to 
obtain  the  amount  in  holdings  of  less  than  60  million  by  sub- 
tracting the  total  timber  reported  in  holdings  of  that  amount 
or  more.  This  was  especially  true  in  the  five  States  not 
having  the  rectangular  survey.  But  in  the  five  States  of 
the  Pacific-Northwest,  containing  the  great  supply  of  tim- 
ber, the  estimates  of  the  total  in  holdings  not  separately 
reported  were  obtained  almost  without  exception  by  the 
use  of  maps  showing  the  location  of  reported  holdings. 
The  informants,  with  these  maps  before  them,  made  gen- 
eral estimates  on  the  timberland  not  so  platted. 

All  holdings  of  less  than  60  milUon  feet  for  which  separate 
information  was  easily  available,  or  which  were  made  the 
subject  of  inquiry  through  behef  that  they  might  be  above 
the  hmit,  were  separately  reported,  and  were  then  tabu- 
lated and  platted  Uke  the  larger  holdings.  The  proportion 
of  the  total  timber  in  holdings  of  less  than  60  million  thus 
separately  reported  is  very  high  in  some  States,  and  this 
increases  the  accuracy  of  the  work. 

For  the  timber  in  holdings  of  at  least  60  million  feet,  the 
primary  reliance  was  on  the  estimates  of  the  owners  or 
their  representatives.  But  these  estimates  were  not 
treated    as    necessarily    conclusive.     Many    of    them    were 


COLLECTION   OF  STATISTICAL   DATA  107 

made  years  ago,  and  omitted  kinds  or  sizes  of  trees  that 
were  not  then  accounted  merchantable,  but  are  so  accounted 
now.  Many  were  admittedly  only  rough  approximations. 
A  very  few  owners,  especially  such  as  have  borrowed  money 
on  their  timber,  were  disposed  to  claim  more  than  they  pos- 
sessed ;  very  many  holders  did  not  wish  the  Bureau  to 
know  how  large  their  holdings  were.  Some  purposely 
gave  erroneous  information ;  others  avoided  the  issue  by 
giving  access  to  old  records  which  did  not  show  the  amount 
of  timber  under  present  standards,  and  withholding  more 
recent  records  and  facts  within  their  personal  knowledge. 
Agents  were  instructed  to  watch  for  errors  from  all  these 
causes  and  to  gather  such  evidence  as  might  be  available 
for  correcting  them.  The  owner's  estimate  was  taken 
as  prima  facie  evidence  of  the  amount  of  his  holding,  but 
it  was  checked,  wherever  possible,  with  the  estimates  of 
other  competent  persons,  such  as  former  owners,  timber 
estimators  who  had  examined  the  tract,  business  asso- 
ciates, and  others. 

While  the  platting  of  the  land  owned  by  each  holder  and 
the  replatting  of  it  on  county  maps  required  a  great  deal 
of  time,  the  work  was  absolutely  essential  to  the  investi- 
gation. An  informant  who  would  have  understated  the 
acreage  owned  was  deterred  therefrom  by  having  to  show 
its  location.  Through  the  use  of  the  plats,  other  men  could 
be  interviewed  regarding  the  amount  of  timber  on  the  hold- 
ing or  such  subdivisions  of  it  as  they  were  familiar  with. 
Occasionally  land  not  reported  by  the  owner  but  otherwise 
indicated  as  owned  by  him  could  be  added  through  further 
inquiry  and  the  owner's  supplemental  statement.  Again, 
the  use  of  plats  prevented  duplication,  and  made  it  possible 
to  say  positively  that  a  holding  reported  under  one  name 
was  or  was  not  the  same,  wholly  or  in  part,  as  a  holding 


108  STATISTICAL   METHODS 

reported  by  another  agent  under  another  name.  Such 
duphcation  results  from  transfer  of  ownership  during  the 
inquiry  and  from  occasional  uncertainty  on  the  part  of 
local  informants  regarding  the  owner  of  record.  An  esti- 
mate on  a  given  tract  might  be  obtained  from  the  cor- 
poration or  individual  who  owned  it  at  the  time,  and  some 
months  later,  in  another  State,  the  estimate  might  be  ob- 
tained from  a  corporation  or  individual  who  had  bought 
the  tract  in  the  meantime. 

As  has  been  said,  the  field  work  was  begun  in  the  lumber 
centers,  where  men  may  be  found  who  own  timber  from 
Florida  to  Washington.  The  agents  were  invariably  in- 
structed to  report  information  from  every  authoritative 
source,  on  all  timber  wherever  situated.  On  the  holding 
of  an  Oregon  corporation,  for  example,  one  estimate  might 
be  obtained  from  the  manager  at  the  mill,  another  from  the 
treasurer  at  Portland,  and  another  from  the  president  in 
Wisconsin.  Sometimes  such  estimates  differed  widely. 
There  might  be  additional  estimates  from  persons  holding 
less  responsible  positions  in  the  company,  or  wholly  uncon- 
nected with  it,  such  as  cruisers  who  had  examined  the 
timber  for  the  present  owner  or  for  others.  In  some  States, 
notably  Washington,  estimates  had  been  made  by  pubUc 
officers  for  purposes  of  taxation,  and  these  records  were 
carefully  considered.  All  the  available  estimates  for  each 
holding  separately  reported  were  transferred  in  the  office 
from  the  original  reports  to  a  single  tabulation  sheet,  so  that 
they  could  be  readily  compared.  Then  the  evidence  was 
carefully  weighted,  with  due  regard  to  the  position,  means 
of  knowledge,  and  apparent  credibiHty  of  each  informant. 
The  estimate  finally  set  down  was  a  result  of  the  considera- 
tion and  balancing  of  testimony  from  many  sources,  often 
conflicting.     In  every  case,  an  effort  was  made  to  arrive 


COLLECTION   OF  STATISTICAL   DATA  109 

at  the  best  possible  judgment ;  but  the  care  and  time  de- 
voted to  the  effort  were  increased  with  the  importance  of 
the  specific  case. 

Before  determining  the  final  estimates  placed  on  these 
"company  sheets"  (each  company  sheet  showing  the  esti- 
mates for  that  particular  holding,  by  counties)  prehminary 
tables  had  been  prepared  for  each  county,  giving  the  num- 
ber of  acres,  estimate  of  timber  by  species,  and  average  stand 
per  acre,  for  each  separately  reported  holding  in  that  county. 
These  prehminary  "county  tables"  threw  much  Hght  on 
the  estimates  for  particular  holdings,  for  with  the  help  of 
the  county  maps  the  average  stands  reported  by  neighbor- 
ing owners  could  he  compared  with  a  view  to  detecting 
abnormal  variations.  Again,  the  county  tables  of  par- 
ticular holdings  were  a  valuable  aid  as  a  check  on  the  gen- 
eral estimates  for  the  unenumerated  holdings  and  on  the 
total  timber  in  the  county.  Over  large  areas,  the  average 
stand  given  for  the  holdings  of  less  than  60  million  feet  was 
compared,  township  by  township,  with  the  stands  reported 
by  the  separate  holders  above  that  limit. 

When  the  data  gathered  by  the  field  work  had  been  col- 
lated in  the  office,  agents  were  sent  out  a  second  time  over 
practically  all  the  timber  area  in  the  five  States  of  the  Pacific- 
Northwest,  to  verify  and  correct  the  results.  The  agents 
now  had  in  their  hands  a  digest  of  all  reports  previously 
made,  and  the  conclusions  reached  in  the  office,  together 
Avith  a  statement  of  the  principal  points  on  which  there 
was  uncertainty.  The  maps  on  which  the  separate  hold- 
ings had  been  platted  showed  how  the  holdings  were  locally 
related  to  each  other;  which  lay  side  by  side  and  which 
were  intermingled.  Sometimes  the  map  and  the  tables 
showed  that  an  owner's  land  was  closely  associated  in  lo- 
cation with  that  of  others  who  had  reported  two  or  three 


110  STATISTICAL   METHODS 

times  as  much  timber  per  acre.  When  this  appeared,  the 
agent  sought  for  the  explanation.  In  some  cases  he  was 
satisfied  that  all  the  estimates  were  honestly  made  and 
reasonably  accurate;  in  others  he  obtained  admissions 
from  the  owners  themselves,  or  good  evidence  from  other 
sources,  that  some  of  the  estimates  first  given  were  far  from 
the  truth.  This  second  visit  to  the  Pacific-Northwest 
was  necessary  in  greater  part  because  of  the  unwillingness 
with  which  many  of  the  most  important  owners  there  had 
met  the  Bureau's  request,  some  of  them  giving  data  which 
were  admitted  on  the  second  visit  to  be  incorrect ;  and  in 
lesser  part  because  of  the  very  marked  change  in  the  stand- 
ards of  merchantable  timber  in  that  region.  This  has 
largely  destroyed  the  value  of  the  estimates  made  several 
years  ago,  and  many  of  the  estimates  first  given  to  the 
Bureau  were  of  this  kind.  The  aim  was  to  get  sufficient 
evidence  to  correct  all  estimates  to  an  approximate 
agreement  with  present-day  standard  of  merchantable 
timber. 

This  second  period  of  field  work  in  the  five  States  of  the 
Pacific-Northwest  not  only  overcame  these  two  difficulties, 
for  the  most  part,  but  also  increased  the  general  accuracy 
of  the  work  so  that  the  data  for  that  region  are  beUeved 
to  be  more  reliable,  according  to  current  standards,  than 
those  for  either  the  Southern  Pine  Region  or  the  Lake 
States.  In  the  course  of  the  investigation,  the  Southern 
Pine  Region  was  taken  up  first,  then  the  Lake  States,  then 
the  Pacific-Northwest,  and  after  that  the  second  visit  to 
the  last  region.  The  methods  used  developed  toward 
perfection  as  the  work  went  on,  and  the  agents  became 
more  and  more  experienced,  and  this  played  a  very  im- 
portant part  in  overcoming  the  greater  difficulties  in  the 
West. 


COLLECTION   OF  STATISTICAL  DATA  HI 

REVIEW 

1.  The  accuracy  of  the  estimate  of  lumber  from  logs  seems  to 
be  conditioned  by  the  measuring  scale,  the  diversity  of  conditions, 
and  the  personal  equation.  In  what  respects  is  each  of  these  in- 
volved? Do  the  "errors"  due  to  these  tend  to  compensate  each 
other  ? 

2.  What  are  the  methods  of  estimating  standing  timber?  How 
feasible  is  a  count  of  trees  and  scaling  of  the  logs  ?  Is  there  an  ele- 
ment of  bias  in  any  of  these  methods?  Why  or  why  not?  Can 
estimates  be  scientific?     Why? 

3.  State  the  principles  which  guided  the  Bureau  of  Corporations 
in  making  an  estimate  of  standing  timber.  What  methods  were 
followed?  Might  these  be  called  "drag-net"  methods?  Why? 
Does  the  method  of  "balanced  testimony"  seem  to  you  good? 
Good  for  other  purposes?     What?     Illustrate. 

4.  What  principles  of  statistical  methods  does  this  extract 
illustrate?  Would  these  be  true  of  other  problems  of  sampling 
and  estimating? 

5.  Just  how  important  in  your  judgment  is  the  personal  element 
in  this  problem? 

6.  What  standards  of  accuracy  seemed  to  be  aimed  at  here? 
Is  accuracy  always  a  relative  term?     Why? 

Sampling  in  the  Development  of  Markets  * 

The  business  man  must  first  realize  the  intricacy  of  the 
problems  he  ha,s  to  solve.  He  must  analyze  his  market. 
.  .  .  The  business  man  faces  a  body  of  possible  pur- 
chasers, widely  distributed  geographically,  and  showing 
wide  extremes  of  purchasing  power  and  felt  needs.  The 
effective  demand  of  the  individual  consumer  depends  not 
alone  upon  his  purchasing  power  but  also  upon  his  needs, 
conscious  or  latent,  resulting  from  his  education,  character, 
habits,  and  economic  and  social  environment.     The  market, 

^  Adapted  with  permission  from  A.  W.  Shaw,  .Some  Problems  in  Market 
Distribution,  Harvard  University  Press,  1915,  pp.  100-119. 


112  STATISTICAL   METHODS 

therefore,  splits  up  into  economic  and  social  strata,  as  well 
as  into  geographic  sections. 

The  producer  cannot  disregard  the  geographic  distri- 
bution of  the  consuming  public.  He  may  be  able  to  sell 
profitably  by  salesmen  where  the  population  is  dense,  while 
such  method  of  sale  would  be  unprofitable  in  a  region  where 
there  is  a  sparse  population.  If  he  bases  a  judgment  upon 
the  average  cost  of  selling  by  salesmen  for  the  whole  market, 
he  may  easily  go  wrong,  since  the  average  might  show 
that  the  use  of  such  an  agency  was  on  the  whole  profitable, 
although  in  some  sections  entering  into  the  calculations 
the  use  of  salesmen  was  actually  unprofitable.  Again, 
it  might  be  economical  for  the  distributor  to  establish  his 
own  branch  stores  in  the  denser  urban  centers,  while  in 
the  sparsely  populated  regions  he  could  most  profitably 
distribute  his  product  through  the  regular  channels. 

If,  then,  a  sound  system  of  distribution  is  to  be  estab- 
lished, the  business'  man  must  realize  that  each  distinct 
geographic  section  is  a  separate  problem.  The  whole 
market  breaks  up  into  differing  regions. 

Equally  important  is  a  realization  of  what  may  be  termed 
the  market  contour.  The  market,  for  the  purposes  of  the 
distributor,  is  not  a  level  plain.  It  is  composed  of  the  dif- 
fering economic  and  social  strata.  Seldom  does  the  ordinary 
business  man  appreciate  the  market  contour  in  reference 
to  his  product.  Yet  obviously  the  success  of  the  pro- 
ducers of  trade-marked  hats  depends  upon  a  realization 
of  this  element  of  market  contour.  The  distributor  of  a 
staple  hat  at  $3.00  appeals  to  different  economic  and  social 
strata,  faces  different  considerations,  and  finds  different 
selling  methods  necessary,  as  compared  with  distributors 
selling  a  $5.00  trade-marked  hat,  or  those  distributors  sell- 
ing   $4.00    or    $6.00    trade-marked    hats.     Differences    in 


COLLECTION  OF  STATISTICAL  DATA  113 

economic  and  social  strata  to  be  reached  are  as  important 
as  differences  in  geographic  location  and  density,  if  a  sound 
system  of  distribution  is  to  be  worked  out. 

Take  the  distributor  who  seeks  to  map  out  a  selling  cam- 
paign for  a  Catholic  publication.  It  is  essential  that  he  take 
into  account  not  merely  the  geographic  distribution  of  the 
Cathohc  population  in  the  United  States,  the  regions  where 
it  is  relatively  dense,  and  the  regions  where  it  constitutes  a 
small  element  in  the  population,  but  also  he  must  take  into 
account  the  distribution  of  that  population  through  the  eco- 
nomic strata  of  society.  A  method  of  distribution  successful 
in  New  Orleans,  where  the  Catholic  population  is  dense  and 
spread  through  all  economic  strata  of  society,  might  well  fail 
if  applied  in  Maine,  where  the  Catholic  population  is  rela- 
tively sparse  and  found  mostly  in  the  lower  economic  strata. 

A  careful  analysis  of  his  market,  then,  by  areas  and  by 
strata,  is  the  first  task  of  the  modern  distributor. 

Choice  of  Agencies  in  Distribution 

Nor  does  the  merchant-producer  ordinarily  realize  how 
intricate  is  his  problem  as  to  the  agency  or  combination 
of  agencies  that  will  be  most  efficient  in  reaching  his 
market.  .  .  .  The  business  man  often  adopts  one  method 
and  becomes  an  advocate  of  it,  disregarding  entirely  other 
methods.  While  the  method  adopted  may  be  more  effi- 
cient than  any  other  single  method,  it  is  apparent  that  a 
method  which  is  relatively  efficient  in  reaching  one  area 
may  be  inferior  to  another  method  in  reaching  another  area. 
And  so  a  system  of  distribution  which  has  proved  very  ef- 
fective in  reaching  one  economic  stratum  may  be  relatively 
inefficient  when  employed  to  reach  a  different  economic 
stratum  in  society. 


114  STATISTICAL   METHODS 

The  problem,  then,  of  working  out  the  most  effective 
combination  of  agencies  is  a  most  complicated  one.  Each 
distinct  area  and  economic  stratum  must  be  treated  as  a 
separate  problem,  and,  moreover,  the  economic  generaliza- 
tions embodied  in  the  law  of  diminishing  returns  must  be 
taken  into  account  in  choosing  that  combination  of  selling 
agencies  which  will  give,  in  the  aggregate,  the  most  effi- 
cient organization  of  the  market. 

Thus  the  distributor  may  find  as  he  extends  his  opera- 
tions in  his  immediate  territory,  geographically,  that  his 
selling  cost  steadily  decreases,  but  that  when  he  further 
extends  his  market  the  selling  cost  increases.  He  may 
find  that  in  more  distant  areas  selling  by  salesmen  ceases 
to  be  profitable,  and  there  he  will  perhaps  establish  a  more 
economical  system  of  selling  by  a  combination  of  salesmen 
and  circular  letters.  That  is,  he  may  reduce  the  number 
of  visits  by  salesmen  by  one  half,  and  supplement  their 
efforts  by  a  series  of  circular  letters  or  more  personal  cor- 
respondence. In  even  more  distant  areas,  it  may  be  nec- 
essary to  eliminate  the  salesmen  entirely  and  to  sell  only 
by  direct  advertising.  .  .  . 

A  sound  selling  policy,  then,  must  be  built  up  on  a  careful 
analysis  of  the  market  by  areas  and  strata,  and  upon  a 
detailed  study  of  the  proper  agency  or  combination  of  agen- 
cies to  reach  each  area  and  stratum,  taking  into  account 
always  the  economic  generalizations  expressed  in  the  law 
of  diminishing  returns.  It  must  also  take  into  account 
not  only  the  direct  results  obtained  from  the  use  of  one 
or  the  other  agency  over  a  short  period,  but  also  the  less 
measurable  results  represented  by  the  unexpressed  con- 
scious demand  and  subconscious  demand,  which  go  to  aid 
future  selling  campaigns.    - 

All  this  tends  rather  to  give  a  general  sense  of  direction 


COLLECTION   OF  STATISTICAL  DATA  115 

than  to  serve  as  a  practical  and  tangible  method  of  handling 
a  specific  problem  of  distribution.  A  clear  grasp  of  the 
problem  through  a  careful  analysis  is  the  first  step  in  solv- 
ing difficulties.  To  suggest  any  cure-all  or  even  any  panacea 
for  the  existing  maladjustments  in  distribution,  even  were 
it  possible,  is  not  the  purpose  of  this  paper.  The  very  com- 
plications revealed  by  analysis  indicate  the  inadequacy 
of  any  single  remedy.  But  it  is  possible  to  face  the  problem 
of  remedy  as  well  as  of  diagnosis  in  a  scientific  spirit,  — 
to  introduce  what  may  be  termed  the  "laboratory  method," 

Laboratory  Study  of  Distribution 

The  crux  of  the  distribution  problem  is  the  proper  exer- 
cise of  the  selling  function.  The  business  man  must  con- 
vey to  possible  purchasers  through  one  agency  or  another 
such  ideas  about  the  product  as  will  create  a  maximum 
demand  for  it.  This  is  the  fundamental  aim,  whatever 
the  agency  employed.  Hence  this  is  the  point  where  a 
scientific  study  of  distribution  must  first  be  applied.  How 
is  the  business  man  to  determine  what  ideas  are  to  be  con- 
veyed to  the  possible  purchaser  and  what  form  of  expres- 
sion is  best  adapted  to  such  conveyance  ? 

Here,  as  elsewhere  in  distribution,  the  ordinary  business 
man  is  to-day  working  by  rule  of  thumb.  He  guesses  at  the 
suitable  ideas  and  forms  of  expression,  and  gambles  on  his 
guess.  On  the  basis  of  his  a  priori  selection  of  ideas  fitted 
to  build  up  a  demand  for  his  product  and  of  a  form  of  expres- 
sion suited  to  convey  the  ideas  effectively,  he  invests  tens, 
even  hundreds  of  thousands  of  dollars  in  a  selling  campaign. 

The  more  able  business  men,  to  be  sure,  seek  to  deter- 
mine those  facts  about  their  goods  that  will  attract  the  at- 
tention of  the  possible  purchaser  and  awaken  in  him  the  de- 


116  STATISTICAL   METHODS 

sired  reaction  that  is,  a  desire  for  the  article.  They  study- 
in  a  general  way  the  points  of  superiority  in  quaUty  and 
service  possessed  by  their  products  as  compared  with  other 
goods  of  like  kind. 

They  also  seek  guides  as  to  the  form  in  which  the  ideas 
should  be  conveyed,  in  the  general  principles  of  style,  all 
based  on  the  fundamental  notion  of  conserving  the  pro- 
spective purchaser's  mental  energy  by  cutting  down  the 
friction  of  communication.  They  know,  for  instance,  that 
tljey  should  use  short  famihar  words  expressing  their  exact 
shade  of  meaning;  that  they  should  give  preference  to  fig- 
urative language ;  that  they  should  suggest  a  concrete 
image  only  after  the  materials  of  which  it  is  to  be  made  are 
conveyed ;  that  they  should  avoid  abstractions  and  gen- 
eraUzations  where  possible ;  that  when  they  are  suggest- 
ing the  reaction  desired  their  language  should  become  quick, 
sharp,  and  compeUing. 

These  things  the  more  efficient  business  men  know  and 
apply.  But  all  this  is  a  priori.  The  need  is  for  a  method 
of  practical  test  that  will  enable  us  to  try  out  selling  ideas 
and  forms  of  expression,  under  laboratory  conditions,  as  it 
were,  before  the  investment  of  thousands  and  hundreds  of 
thousands  of  dollars  is  staked  on  the  success  of  the  selling 
campaign. 

Mention  has  been  made  of  the  annual  expenditure  of 
not  less  than  a  billion  dollars  in  advertising.  Unques- 
tionably an  extremely  large  percentage  of  this  is  wasted. 
This  means  not  merely  individual  loss,  but  social  loss.  It 
is  a  diversion  of  capital  and  productive  energy  into  un- 
profitable channels. 

The  causes  of  this  waste  are  numerous.  The  commodity 
in  question  may  be  one  not  possessing  those  elements  of 
quaUty  and  service  which  constitute  the  basis  for  a  demand 


COLLECTION   OF  STATISTICAL   DATA  117 

on  the  part  of  the  consuming  pubUc.  If  the  goods  ad- 
vertised are  not  adapted  to  satisfy  a  need,  conscious  or  sub- 
conscious, of  consumers,  the  advertising  cannot  be  effective. 
Attempting  to  sell  a  thing  that  nobody  needs  is  wasted  effort. 

Again,  the  medium  used  for  the  communication  of  the 
ideas  about  the  goods  may  not  be  one  that  reaches  the 
particular  economic  or  social  stratum  in  which  possible  pur- 
chasers of  the  commodity  lie.  Hence  the  ideas  fail  to 
create  a  demand  because  they  do  not  reach  those  in  whom 
a  latent  need  for  the  commodity  exists. 

Another  important  cause  of  advertising  waste  lies  in  the 
failure  to  take  advantage  of  aroused  demand.  The  dis- 
tributor often  fails  to  give  proper  attention  to  the  matter 
of  the  physical  supply  of  his  product.  There  results  a  con- 
siderable leakage  in  demand  from  the  inability  of  persons 
in  whom  a  demand  has  been  created  to  obtain  the  goods 
at  the  time  when  desired. 

But  the  great  cause  of  waste  is  probably  the  fact  that 
the  ideas  about  the  goods,  or  the  form  in  which  those  ideas 
are  conveyed  to  possible  purchasers,  prove  ill-adapted  to 
secure  the  desired  reaction,  and  thus  to  create  in  the  con- 
sumer an  effective  demand. 

If  we  can  apply  to  this  pressing  problem  of  advertising 
waste  methods  of  study  which  have  proven  efficient  in 
other  fields,  the  gain  is  clear.  The  engineer  does  not  choose 
material  for  a  bridge  by  building  a  bridge  of  material  and 
waiting  to  see  whether  it  stands.  He  first  tests  the  ma- 
terial in  the  laboratory.  That  is  what  the  business  man 
must  do. 

The  statistician  turns  in  his  problems  to  the  law  of  aver- 
ages. He  is  familiar  with  what  are  termed  mass  phenomena. 
He  knows  that  he  can  learn  something  of  the  average  height 
of  a  body  of  people  by  studying  the  heights  in  a  group  of  a 


118  STATISTICAL   METHODS 

few  thousands  of  people  drawn  at  random  from  the  larger 
body.  Provided  that  the  smaller  group  is  so  selected  as  to 
insure  that  it  is  typical  of  the  larger  body,  and  provided 
the  group  is  large  enough  to  render  the  law  of  averages 
appUcable,  the  statistician  knows  when  he  has  determined 
the  average  height  of  the  smaller  group  that  it  will  roughly 
coincide  with  the  average  height  of  the  larger  group. 

This  method  of  study  can  be  applied  by  the  business 
man  in  testing  the  ideas  and  forms  of  expression  to  be  used 
in  a  selling  campaign.  In  direct  advertising,  the  maiUng 
of  selling  letters,  circulars,  or  catalogues  to  prospective 
purchasers  to  draw  from  them  an  order  for  goods  as  an 
evidence  of  awakened  demand,  you  have  a  stimulus  and  re- 
sponse adapted  to  direct  statistical  measurement.  The 
number  of  responses  per  thousand  communications  can  be 
determined.  Here  is  the  agency  that  the  business  man 
can  employ  in  testing,  under  what  are  equivalent  to  lab- 
oratory conditions,  the  ideas  and  forms  of  expression  that 
seem  to  him  best  adapted  to  awaken  a  demand  for  his 
product. 

Suppose  the  manufacturer  of  a  food  product  is  planning 
a  campaign  to  reach,  not  the  consumer,  but  the  grocers 
of  the  country.  Now  the  whole  body  of  dealers,  large 
and  small,  handling  groceries  numbers  something  like 
250,000.  Let  the  distributor,  after  working  out  a  set  of 
ideas  and  forms  of  expression  which  seem  to  him  likely  to 
be  effective  in  arousing  the  desired  demand,  test  this  ma- 
terial by  mailing  it  to  say  1000  grocers.  The  group  se- 
lected must  be  large  enough  to  give  typical  results  and  it 
must  be  so  selected  as  to  be  representative  in  character 
of  the  whole  body  of  grocers. 

Granting  these  elements,  the  distributor  can  determine 
the  number  of  responses  from  the  1000  grocers  to  whom 


COLLECTION  OF  STATISTICAL  DATA  119 

the  communication  was  sent,  and  can  estimate  from  that 
result  the  average  response  per  thousand  of  communica- 
tions that  would  have  been  obtained  if  the  same  ideas  in 
the  same  form  of  expression  had  been  conveyed  to  the 
whole  body  of  250,000  dealers  in  groceries  in  the  country. 
He  can  then  test  by  means  of  direct  mailing  to  another  group 
of  1000,  a  varying  set  of  ideas  or  varying  form  of  expression. 
And  so  on  with  other  modifications  of  the  selling  material. 
Thus  it  will  be  possible  to  determine  what  ideas,  in  what 
arrangement  and  in  what  form  of  expression,  are  most 
effective  to  arouse  the  desired  demand. 

That  the  plan  suggested  is  practical  is  indicated  by  the  re- 
sults of  such  an  intensive  study  presented  in  the  table  below. 
Here  are  shown  the  results  of  "  tests  "  and  the  results  of 
complete  mailings.  The  tests  here  covered  only  one  stratum 
of  society,  a  maihng  list  of  bankers  being  used.  The  pur- 
pose of  the  selling  ma.terial  mailed  was  to  obtain  orders  for 
certain  publications.  Various  forms  of  "copy"  were  tested 
by  mailing,  usually  to  500  names  on  the  Ust.  Where  the 
return  on  any  test  exceeded  the  minimum  standard  of 
twenty  orders  per  thousand  communications  the  material 
was  mailed  to  the  complete  list.  In  only  one  case  did  the 
complete  mailing  fail  to  show  an  average  return  per  thou- 
sand communications  substantially  the  same  as  that  de- 
rived from  the  test  mailing.  In  the  case  of  Test  D^  mailed 
September  15,  1909,  the  return  is  clearly  out  of  proportion 
to  the  results  from  the  mailing.  The  same  material  mailed 
on  the  same  date,  however  (Test  D^),  gives  for  a  similar 
small  group  a  return  much  closer  to  the  results  obtained 
from  the  final  mailing.  When  a  minimum  standard  as 
low  as  twenty  is  used,  and  the  test  group  numbers  only 
500,  there  is  danger  that  the  average  will  be  disturbed  as 
by  one  individual  sending  in  several  orders.     The  larger 


120 


STATISTICAL   METHODS 


Bankers'  Tests 
Minimum  Standard  =  20  per  M 


Tests 

Mailings 

Material 
Mailed 

Date 

No.  of 
Pieces 
Mailed 

Total 
Orders 

Re- 
ceived 

No.  per 
M 

Date 

No.  of 
Pieces 

Mailed 

Total 
Orders 
Re- 
ceived 

No.  per 
M 

1909 

1909 

A' 

3/30 

500 

3 

6 

A2 

3/30 

500 

5 

iO 

Bi 

8/13 

500 

6 

12 

B2 

9/13 

500 

3 

6 

Ci 

9/15 

500 

4 

8 

C2 

9/15 

500 

3 

6 

D2 

9/15 
9/15 

453 
500 

6l 
18  J 

25  I 

9/27 

19,943 

360 

18 

E 

9/16 

500 

7 

14 

F 

9/21 
9/21 

500 
500 

241 
12] 

36  I 

11/23 

16,511 

589 

35 

G 

10/18 

1,000 

30 

30 

11/28 
1910 

21,790 

643 

29.5 

H 

11/16 
1910 

500 

11 

22 

fl/24 
ll/24 

6,554 
16,039 

1651 
390/ 

24 

I 

4/11 

500 

12  1 

24 

[5/5 
15/4 

6,810 

1451 
336 

25 

4/11 

500 

12  1 

12,154 

Note.  —  Where  the  same  letter  appears  with  different  exponents  under 
"material  mailed"  it  indicates  that  on  the  test  mailing  results  were  kept 
separately  for  the  same  material  mailed  to  two  small  groups. 

the  test  group  the  more  exact  an  index  will  it  give  as  to 
the  results  which  will  be  obtained  from  a  complete  mailing. 

This  method  of  studying  ideas  and  forms  of  expression 
in  direct  advertising  would  be  important,  even  though  its 
usefulness  did  not  extend  beyond  direct  advertising.  It 
would  permit  one  to  guide  a  widely  extended  direct  ad- 
vertising campaign  by  an  investigation  relatively  inexpen- 
sive. 


COLLECTION  OF  STATISTICAL  DATA  121 

But  the  importance  of  the  method  described  does  not 
end  with  direct  advertising.  Remember  that  the  root 
idea  is  the  same,  whatever  the  agency  for  seUing  employed. 
Sehing  is  accomphshed  by  communicating  to  the  possible 
purchaser  ideas  about  the  goods  calculated  to  stimulate 
in  him  a  desire  for  the  goods.  These  ideas  may  be  communi- 
cated through  middlemen,  salesmen,  general  advertising, 
or  direct  advertising.  Since  the  ideas  are  the  same,  what- 
ever the  agency  for  communication,  the  business  man  can 
determine  in  his  direct  selling  laboratory,  what  ideas  and 
in  what  combination  are  the  most  effective  selling  material. 
He  can  then  carry  over  into  his  selling  by  other  agencies 
the  knowledge  there  obtained. 

Suppose  an  extensive  campaign  through  periodicals  is 
under  consideration.  The  distributor  contemplates  spend- 
ing perhaps  hundreds  of  thousands  of  dollars  upon  adver- 
tising in  certain  periodicals.  What  can  the  "distribution 
laboratory"  do  to  determine  the  ideas  to  be  conveyed 
and  the  forms  of  expression  to  be  used  to  create  the 
desired  demand?  Now  the  circulation  of  a  periodical  to 
be  used  may  run  into  the  hundreds  of  thousands  or  even 
into  the  milUons,  The  business  man  wishes  to  test  the 
response  that  will  result  from  the  communication  to  this 
enormous  body  of  subscribers  of  certain  ideas  expressed 
in  certain  forms.  Not  only  can  he  work  out  the  most 
effective  ideas,  the  most  effective  arrangement,  and  the 
most  effective  forms  of  expression  through  the  agency 
of  direct  mailing,  but  he  can  even  test  the  final  "copy" 
itself,  just  as  it  will  appear  in  the  periodical,  by  maiHng 
it  directly  to  relatively  small  groups. 

Moreover,  he  can  test  the  response  to  it  found  in  differ- 
ing strata  of  society.  Ideas  adapted  to  build  up  a  demand 
for  a  commodity  in  one  economic  or  social  stratum  may 


122  STATISTICAL   METHODS 

prove  ineffective  when  dealing  with  another.  The  im- 
portance of  this  method  Ues  in  the  fact  that  most  periodi- 
cals circulate  within  certain  fairly  well-defined  economic  and 
social  strata.  The  ideas  and  forms  of  expression  that 
are  most  effective  in  one  periodical  hence  may  be  relatively 
ineffective  if  used  in  another  that  reaches  a  different  stratum. 

Equally  important  is  the  application  of  the  suggested 
method  of  study  to  selling  through  salesmen.  The  more 
progressive  business  men  to-day  train  the  salesmen  in  a  cer- 
tain basic  "selHng  talk."  That  is,  certain  ideas,  arranged 
in  a  certain  order  and  expressed  in  certain  forms,  are  im- 
pressed upon  them  as  likely  to  build  up  a  demand  for  the 
article  on  the  part  of  possible  purchasers.  The  basic  "sell- 
ing talk"  is  not,  of  course,  repeated  parrot-like  by  the 
salesman,  but  it  does  serve  as  a  foundation  for  his  talks  to 
possible  buyers. 

Here  again  the  laboratory  idea  can  be  applied.  The 
whole  structure  of  the  selling  talk  can  be  built  up  on  the 
ideas,  order  of  arrangement,  and  forms  of  expression  es- 
tablished as  the  most  efficient  in  creating  demand  through 
the  medium  of  direct  advertising.  One  need  but  appre- 
ciate the  fundamental  identity  of  the  selling  function, 
through  whatever  agency  exercised,  to  realize  that  the  re- 
sults obtained  in  experiments  in  direct  advertising  can  be 
carried  over  to  selling  by  salesmen. 

Note,  too,  that  the  general  principles  upon  which  the 
"testing"  method  depends,  apply  when  we  seek  to  study 
the  possibilities  of  the  whole  market  by  the  intensive  culti- 
vation of  one  section  of  it.  A  locaUzed  selling  campaign, 
narrow  in  extent,  will  give  relatively  exact  data  from  which 
the  possibilities  of  a  nation-wide  campaign  of  like  char- 
acter may  be  judged.  Obviously,  if  our  law  of  averages 
holds  good,  we  may  carry  over  the  results  obtained  in  one 


COLLECTION   OF   STATISTICAL   DATA  123 

section  to  other  sections,  and  hence  at  small  cost  guide  a 
widespread  campaign. 

The  exact  data  that  can  be  obtained  through  such  "test- 
ing" methods  permit  a  more  scientific  consideration  of  the 
decreasing  returns  obtained  if  one  agency  is  used  beyond 
a  certain  point.  -  Hence  a  better  combination  of  agencies 
is  possible,  with  a  view  to  the  greatest  aggregate  efficiency. 

The  Effect  of  Different  Price  Policies 

When  a  business  man  contemplates  putting  a  new  prod- 
uct on  the  market,  a  serious  prol^lem  is  the  price  at  which 
it  shall  be  sold.  In  the  introduction  of  a  safety  razor,  for 
instance,  at  what  price  is  it  to  be  sold?  In  such  a  case  the 
business  man  seeks  to  determine  which  price  will  give  him 
the  best  net  return,  all  things  considered.  Now  the  method 
of  study  developed  above  will  permit  the  business  man  to 
determine  by  actual  test  the  effective  demand  that  can  be 
built  up  at  different  price  levels  in  different  economic  and 
social  strata.  Hence  he  can  fix  the  price  on  the  basis  of  rela- 
tively exact  data,  rather  than  on  a  mere  guess. 

Again  the  laboratory  method  here  suggested  lends  itself 
to  a  determination  of  what  elements  of  quality  and  service 
in  a  given  product  are  deemed  most  essential  by  the  con- 
sumer. The  effectiveness  of  the  ideas  conveyed  in  build- 
ing up  a  demand  reflects  the  intensity  of  human  wants  as  to 
the  elements  of  quality  and  service  described.  The  pro- 
ducer can  sound  the  consumer  and  can  better  adapt  his 
product  to  the  consumer's  felt  needs. 

Thus  an  entire  selling  campaign  can  be  directed  on  the  basis 
of  what  may  be  termed  laboratory  study.  The  empirical 
methods  of  the  ordinary  business  man  may  be  supplemented 
by  scientific  methods  that  have  proven  efficient  in  other  fields. 


124  STATISTICAL   METHODS 

The  above  practical  suggestions  have  been  directed 
primarily  to  the  business  man  struggling  with  his  immediate 
problems.  Yet  it  may  be  well  to  emphasize  once  more  the 
social  importance  of  the  suggestions.  It  is  not  merely  that 
a  large  annual  waste  in  advertising  can  be  eliminated.  Our 
whole  system  of  distribution  is  in  chaos.  And  the  chaotic 
conditions  in  distribution  mean  that  matter  is  ill  adjusted 
in  form  and  place  to  human  wants.  Only  as  systematic 
and  widespread  study  along  the  lines  indicated  is  given 
to  the  problems  of  distribution,  can  we  build  up  an  or- 
ganized body  of  knowledge  as  to  the  facts  and  principles 
involved.  And  only  on  the  basis  of  an  organized  body  of 
knowledge  about  distribution  can  we  hope  to  work  out  a 
more  efficient  organization  of  distribution. 

And  to  this  end  the  business  man  must  cooperate  with 
the  scientist  of  the  university.  Much  can  be  done  by  the 
trained  student  in  his  laboratory  or  in  his  study  that  will 
be  of  practical  value  in  making  possible  a  more  efficient 
organization  of  distribution.  The  experimental  psycholo- 
gist can  do  much  to  work  out  general  principles  that  will 
aid  the  business  man  in  solving  definite  selling  problems. 
The  difficulty  has  been  that  the  laboratory  worker  does 
not  have  the  specific  problems  of  the  business  man  brought 
to  his  attention. 

Similarly,  the  universities,  through  investigators  trained 
in  economics,  can  gather  and  correlate  data  upon  distribu- 
tion that  will  be  of  enormous  practical  value.  They 
should,  through  research  bureaus,  study  such  problems  as 
the  cost  of  distribution  in  the  various  industries  at  differ- 
ent stages.  And  gradually  a  body  of  organized  knowledge 
of  the  actual  facts  of  business  will  arise.  It  is  by  develop- 
ment along  such  lines  that  future  improvements  in  the 
system  of  distribution  will  be  made  possible. 


COLLECTION  OF  STATISTICAL  DATA  125 

REVIEW 

1.  What  is  the  writer's  idea  of  a  market?  Contrast  market 
area  and  market  contour. 

2.  How  does  the  writer  support  the  following  thesis  with  respect 
to  markets  :  "  A  clear  grasp  of  the  problem  through  a  careful  analy- 
sis is  the  first  step  in  solving  difficulties"? 

3.  What  is  the  "laboratory  method"  in  business  analysis? 
What  claim  has  it  to  be  called  "scientific"?  Contrast  it  with  that 
known  as  a  priori. 

4.  Illustrate  the  application  of  the  laboratory  method  to  ad- 
vertising. How  does  the  law  of  averages  in  mass  phenomena  apply 
here?  Is  the  case  different  in  the  determination  of  price  policies? 
Why? 

THE  MEASUREMENT  OF  THE  RATE  OF  FACTORY 

OUTPUT 1 

Enumeration 

The  enumeration  of  any  type  of  output  depends  upon  its 
uniformity  and  its  divisibility  into  units. 

The  first  task  for  every  investigator  proposing  to  use  out- 
put as  a  measure  of  working  capacity  is  to  find  uniform  opera- 
tions performed  throughout  the  period  to  be  studied.  At  a 
large  munition  factory  an  attempted  comparison  of  the  differ- 
ent week's  output  of  certain  girls  nominally  on  the  same  work 
was  made  impossible  in  the  majority  of  cases  owing  to  the 
fact  that  the  girls  were  not  really  continuously  on  the  same 
operation.  One  week  a  particular  girl  working  on  a  capstan 
lathe  was  set  to  make  one  part  of  a  fuse,  in  another  week  or 
even  in  the  same  week  she  was  making  another  part,  of  quite 
different  complexity,  and,  therefore,  with  a  quite  different 
rate  of  output  per  hour.     Indeed  over  the  whole  factory  it 

*  Adapted  with  permission  from  Florence,  Philip  S.,  "Use  of  Factory 
Statistics  in  the  Investigation  of  Industrial  Fatigue,"  Studies  in  History, 
Economics  and  Public  Law,  Columbia  University,  Vol.  LXXXI,  No.  3,  1918, 
pp.  39-55. 


126  STATISTICAL   METHODS 

was  only  in  one  18-pound  shell  cartridge  case  department 
and  in  the  work  of  six  girls  in  the  fuse  department  that  the 
kind  of  output  was  found  sufficiently  uniform  over  a  long 
period  for  purposes  of  enumeration. 

The  investigator  should  be  especially  on  his  guard  that 
products  known  by  the  same  name  are  not  of  slightly  differ- 
ent size,  or  for  some  other  reason  do  not  vary  in  the  effort 
required  to  make  them.  The  output  of  an  individual  may  be 
recorded  on  paper,  as  so  many  unit  "boxes,"  but  when  the 
matter  is  investigated  the  actual  output  will  be  found  to 
fall  into  various  amounts  of  say  2-ounce  boxes,  3-ounce  boxes, 
4-ounce  boxes,  with  no  common  measure  of  the  respective 
requirements  of  each  in  the  amount  of  activity  exerted. 

Where  the  output  is  thus  of  various  kinds,  a  sort  of  com- 
mon denominator  may  sometimes  be  found  for  all  the  varie- 
ties in  the  amount  of  piece  wages  earned,  or  where  the  task 
bonus  system  has  been  introduced  in  the  degree  of  efficiency 
attained.  The  accuracy  of  this  denominator  would  depend 
of  course  on  whether  the  piece  rate  or  percentage  efficiency 
was  estimated  exactly  proportionately  to  the  comparative 
effort  required  of  the  worker  as  between  different  varieties 
of  output.  My  own  experience  with  the  measurement  of 
working  capacity  by  piece  rates  and  by  efficiencies,  even 
where  these  had  been  estimated  by  the  most  careful  time 
and  motion  study,  was  unfavorable  to  the  use  of  such  com- 
mon denominators.  In  one  factory  that  I  visited  the  amount 
of  task  bonus  paid  for  many  processes  depended  on  the  per- 
centage of  efficiency  attained,  and  much  trouble  was  taken 
to  insure  that  100  per  cent  efficiency  in  each  variety  of  work 
entailed  exactly  similar  effort  on  the  part  of  the  worker. 
Now  in  many  departments  a  great  fall  had  been  taking  place 
in  the  efficiency  attained.  But  it  was  admitted  by  repre- 
sentatives of  the  firm  itself  that  this  fall  was  probably  due 


COLLECTION   OF   STATISTICAL  DATA  127 

merely  to  a  change  from  one  kind  of  work  to  another.  At 
my  request  a  study  of  this  factor  was  made  in  one  depart- 
ment, and  there  it  was  seen  that  "efficiency"  clearly  varied 
according  to  the  variety  of  work  being  performed.  It 
seemed  impossible  to  compare  numerically  the  degree  of 
effort  required  in  different  work. 

This  difficulty,  of  course,  in  no  way  nullifies  the  calcula- 
tion by  piece  rate  earnings  or  by  efficiencies  where  the  same 
kind  of  output  is  being  produced  throughout.  If  the  record 
of  earnings  and  efl^iciency  is  more  accessible,  by  all  means  let 
it  take  the  place  of  the  direct  output  record.  .  .  . 

Comparisons  of  the  cost  of  labor  as  a  common  denominator 
for  all  varieties  of  work  will  give  a  still  rougher  measure  of 
working  capacity.  It  does  not  avoid  the  discrepancy  be- 
tween comparative  piece  rates  and  comparative  effort,  and 
in  addition  raises  discrepancies  in  the  actual  computation 
of  the  cost. 

If  a  worker  is  employed  on  different  operations  it  may  be 
possible  to  select  for  comparison  the  output  rate  of  any  one 
operation  that  recurs  regularly  at  intervals.  The  difficulty 
here,  however,  is  that  the  output  rate  of  the  operation  that 
is  selected  will  be  affected  by  the  degree  of  effort  required  on 
the  various  operations  preceding  it ;  and  at  each  recurrence 
of  the  operation  studied,  the  preceding  operations  may  be 
different. 

Operations  that  result  in  a  quantity  of  units  being  produced 
are  confined  to  what  the  manufacturers  and  workers  usually 
call  repetition  work.  How  many  such  units  must  for  statis- 
tical purposes  be  produced  per  day  depends  on  the  period 
studied.  If  the  hourly  output  is  being  compared  the  repeti- 
tion must  obviously  be  more  frequent  than  if  only  the  daily 
output  is  the  subject  of  comparison.  To  show  variations  as 
between  different  periods  with  any  exactness  at  least  three 


128  STATISTICAL  METHODS 

units  should  be  produced  on  the  average  in  each  period  com- 
pared. Sometimes  the  timing  of  output  is  given  not  as  the 
number  of  units  per  hour  or  per  day  but  as  the  number  of 
minutes  or  hours  per  unit.  This,  however,  is  easily  translated 
to  units  per  period  and  the  same  rules  as  to  frequency  apply. 

Luckily  for  the  investigator,  though  possibly  not  for  the 
workers  themselves,  such  frequently  repeated  work  has  been 
increasing  under  the  modern  factory  system  owing  to  the 
continual  replacement  of  men  by  machines  and  the  continual 
division  of  labor.  Work  is  stereotyped  and  work  is  clearly 
defined.  This  applies  very  particularly  to  the  munitions 
industry  where  products  have  to  be  made  according  to  gov- 
ernment "specifications."  The  munitions  industry  accord- 
ingly suppUes  a  very  fine  field  for  output  records. 

Appended  is  a  list  of  a  few  processes  producing  enumerable 
units  that  are  sufficiently  repetitive  to  have  been  used  either 
by  the  present  writer  or  by  fellow-investigators  as  measures 
of  working  capacity. 

Packing  Processes. 

Straightening  rods  or  cans  with  a  hammer. 
Sticking  labels  on  standard-sized  cans. 
Soldering  lids  on  standard-sized  cans. 
FiUing  standard  boxes  with  products. 

Assembling  Processes. 

Assembling  finks  into  a  chain. 

Assembling  the  fuse  of  a  shell. 

Covering  middles  (i.e.  creams)  with  chocolate. 

Joining  sides  and  bottom  of  standard-sized  boxes. 

"Working-up"  Materials.     ("Machining"  Processes.) 

Sewing  belts  and  buttonhoUng  by  machine. 
Drilfing,  boring,  etc.,  parts  of  shell-fuse. 


COLLECTION  OF  STATISTICAL  DATA  129 

Lathe-work  on  standard  18-pound  shells  or  any  standard 
"parts"  of  a  fuse. 

Machine-tending  (semi-automatic). 
Feeding  machine  with  cartridge  cases. 
Feeding,  emptying,  and  controlling  presses. 

Typesetting  by  hand  on  typograph. 

The  same  processes  or  crafts  are  of  course  often  found  in 
different  industries.  The  munition  industry,  for  instance, 
includes  many  processes  found  in  automobile  manufacture. 

Expressiveness 

Once  a  type  of  output  is  found  consisting  of  a  number  of 
units  which  can  be  said  to  vary  "up  or  down"  because  it 
consists  of  a  greater  or  lesser  number  of  units,  the  next  stage 
is  to  select  such  an  enumerable  kind  of  output  that  these 
variations  will  be  expressive  of  variations  in  the  degree  of 
working  capacity.  In  the  case  of  measurement  by  output 
such  expression,  if  it  exists  at  all,  will  of  course  be  "con- 
gruent," i.e.  when  working  capacity  increases  the  output 
rates  will  increase  also  and  vice  versa.  .  .  . 

Elimination  of  Ambiguity 

To  enable  the  rate  of  output  to  measure  working  capacity 
without  ambiguity  the  influence  of  factors  in  the  industrial 
situation  must  be  excluded  that  modify  output  one  way  or 
the  other  without  passing  through  "capacity"  first.  Fac- 
tors hkely  so  to  modify  output  must  be  kept  "constant,"  so 
that  changes  in  output  cannot  possibly  be  attributed  to  any 
changes  in  these  factors  foreign  to  our  study. 

If,  for  instance,  the  output  of  a  factory  was  falKng  from 
one  week  to  another  and  hours  of  activity  had  been  raised, 


130  STATISTICAL   METHODS 

it  would  not  be  possible  to  prove  that  the  decrease  of  output 
had  measured  a  diminution  of  working  capacity  unless  it 
were  certain  that  the  type  of  workers  and  all  other  factory 
conditions  had  remained  constant.  Otherwise  the  fall  in 
output  might  just  as  well  be  attributed  to  a  more  inexperi- 
enced set  of  hands. 

The  chief  factors  that  are  likely  by  their  inconstancy  to 
disturb  or  make  ambiguous  the  relation  of  output  and  work- 
ing capacity  are  connected  first  with  the  type  of  worker  and 
secondly  with  certain  working  conditions.     They  comprise : 

A.  The  Type  of  Worker. 

B.  The  Preparedness  for  Work. 

C.  The  Stimulus  to  Work. 

D.  The  Feasibihty  of  Work. 

A.  Constancy  in  Type  of  Worker:  Where  a  whole  factory's 
output  is  mider  observation  it  is  ob\'ious  that  the  total  may 
quite  likely  be  the  product  of  an  ever-changing  set  of  indi- 
viduals or  even  of  an  increasing  or  decreasing  number. 

Where  the  number  is  changing  the  total  output  should  be 
di\ided  by  the  numbers  employed  and  expressed  as  a  rate 
per  indi\idual  worker.  Sometimes  the  actual  number  at 
work  cannot  be,  or  at  any  rate  has  not  been,  ascertained. 
Though  the  number  of  machines  or  work  benches  is  knowTi, 
yet  a  few  workers  may  have  stayed  away  all  day.  In  one 
munition  factory  I  found  records  gi\ing  the  total  output 
per  shift  in  each  process,  irrespective  of  the  number  of  in- 
dividual girls  working  at  the  time.  But  as  it  was  to  the  in- 
terest of  the  management  to  keep  every  one  of  the  machines 
at  work,  a  reserve  of  girls  was  kept  to  be  put  to  work  in  case 
the  girls  usually  employed  did  not  appear.  Hence  it  is  not 
likely  that  the  number  actually  working  varied  much  as  be- 
tween the  dayshift  and  the  nightshift  on  the  same  date. 
Mass  statistics  such  as  these,  though  inexact  when  taken 


COLLECTION   OF  STATISTICAL  DATA  131 

alone,  are  often  useful  for  checking  the  results  of  intensive 
studies. 

Even  when  known,  the  rate  of  output  per  individual  is 
likely  to  diverge  from  working  capacity  if  the  employees  as 
a  whole  vary  in  their  skill  or  experience. 

A  comparison  attempted  by  the  writer  in  a  munition  fac- 
tory between  the  output  rates  of  girls  working  two  eight- 
hour  shifts  and  girls  working  one  twelve-hour  shift  had  to 
be  abandoned  because  the  number  of  girls  employed  on  the 
one  shift  was  only  half  that  on  the  newly  instituted  two-shift 
system,  hence  every  second  girl  in  the  short  shifts  had  been 
freshly  hired  and  was  inexperienced.  The  average  output 
for  the  short  shift  was  lower,  therefore,  not  because  working 
capacity  had  diminished  among  certain  given  human  organ- 
isms but  because  organisms  of  a  lower  capacit}^  had  been 
added. 

Again,  at  another  munition  factory  hours  had  been  in- 
creased in  the  first  year  of  the  war  and  efficiency  had  fallen, 
but  the  latter  was  not  with  any  certainty  attributable  to  a 
diminished  working  capacity  in  the  same  individuals.  Be- 
sides the  increase  in  the  hours  of  work  there  was  a  constant 
increase  in  the  number  of  new  hands  taken  on.  In  one  de- 
partment a  great  number  left  to  form  a  new  fuse-making 
department,  and  their  places  had  to  be  filled  by  new 
workers. 

It  is  clear  enough  from  this  discussion  that  the  only  factory 
records  really  free  from  ambiguity  are  those  specif3ing  the 
output  of  each  indi\4dual  worker.  The  investigator  should 
always  endeavor  to  compare  only  similar  work  from  the  same 
worker  or  group  of  workers. 

B.  Constant  Preparedness :  Even  when  the  type  of  worker 
is  constant,  or  when  the  output  of  exactly  the  same  workers 
is  studied  throughout,  certain  working  conditions  are  liable 


132  STATISTICAL   METHODS 

by  their  inconstancy  to  render  the  output  an  ambiguous 
measure  of  capacity. 

First  of  all,  conditions  may  not  always  be  ready  for  work 
to  take  place.  Working  time  may  be  wasted  and  not  "filled 
in"  with  work.     The  worker  may  be  waiting 

(1)  for  his  material  to  be  brought  to  him  or 

(2)  for  his  machine  to  be  repaired  or 

(3)  for  power  to  be  connected  with  his  machine. 

Conversely,  material,  machine,  and  power  may  be  waiting 
for  the  worker.  He  may  be  late  coming  in  or  late  getting 
ready  and  preparing  his  materials,  or  he  may  be  called  away 
for  payment  of  wages  or  duties  about  the  factory  or  he  may 
be  allowed  to  leave  early  at  the  end  of  the  day  or  start  his 
tidying-up  early. 

All  these  cases  of  stoppage  or  tardiness  may  be  considered 
involuntary  waste  of  time,  in  the  sense  that  the  work  did 
not  take  place,  because  physically  speaking  it  could  not  be 
performed  ;  the  worker  and  his  equipment  were  not  prepared 
for  the  task. 

In  his  table  of  output  the  investigator  must  note  separately 
the  time  that  was  thus  wasted  involuntarily,  and  that  wasted 
wilhngly,  as  in  talking,  resting,  eating,  voluntarily  leaving 
room,  etc.  Allowance  should  only  be  made  for  the  time  lost 
involuntarily.  The  investigator  must  consider  all  the  hours 
and  minutes  the  worker  actually  was  ready  to  work,  and 
only  those,  and  base  his  rate  of  output  on  that  as  denominator, 
e.g.  if  the  worker  was  prepared  only  for  40  minutes  of  the 
hour  his  output  rate  per  hour  should  be  his  actual  output 
multiplied  by  60/40.  The  output  is  "  corrected  "  in  the  same 
proportion  as  the  nominal  time  was  to  actual  time  prepared 
for  work.  Thus,  where  output  is  reckoned  up  hourly  the 
table  might  run  somewhat  as  follows  : 


COLLECTION   OF   STATISTICAL   DATA 


133 


Hour 

Gross 
Output 

Time  Wasted 
Involuntarily 

Corrected  Outp:;t 

Time  Wasted 
Willingly 

9-10 
10-11 
11-12 

20  Boxes 
15  Boxes 
12  Boxes 

9  :30-35  Ma- 
chine Stoppage 

10:40-11  Lack  of 
Materials 

Call  to  Office  at 
11:40 

20XH=21A 

15X|f=22j 

12Xf§=18 

Rest          9:10-9:20 

Leave  Room 

10:20-10:25 
Talk     11:30-11:35 

The  length  of  the  stoppages  due  to  late  arrival  or  early- 
quitting  of  the  workers  may  be  discovered  in  most  factories 
by  an  automatic  clock  which  stamps  the  exact  time  on  a  card 
inserted  by  each  worker  as  he  enters  or  leaves.  These  "clock- 
ing in"  and  "clocking  out"  cards  are  then  usually  taken  to 
the  wage  office. 

Stoppages  in  the  course  of  work  can  usually  only  be  noted 
by  direct  observation.  Either  the  foreman  or  the  investi- 
gator himself  must  be  prepared  to  time  any  stoppages  of 
more  than  three  minutes'  duration. 

C.  Constanctj  of  Stimulus:  Now,  even  where  industry  is 
as  regularized  as  it  is  in  the  factory,  there  are  many  motives 
playing  upon  the  worker  that  vary  in  force  from  time  to  time. 
The  worker  during  working  hours  must  not  only  be  constantly 
ready  and  prepared  to  work  but  he  must  be  constantly  willing 
and  eager  to  work  as  well.  The  investigator  must  make 
certain  that  workers  are  not  discouraged  nor  "suUdng,"  nor 
yet  controlUng  their  output  deliberately. 

In  one  highly  organized  munition  factory  records  taken 
by  the  firm  itself  on  drilling  work  showed  that  "the  rate  of 
production  drops  heavily  whenever  the  girl  loses  confidence 
in  the  accuracy  of  her  work."  Conversely,  "a  stoppage  due 
to  breakdown,  if  repaired  so  as  to  give  the  girl  confidence, 
causes  an  increase  of  speed."     One  of  the  explanations  offered 


134  STATISTICAL   METHODS 

for  this,  namely  the  desire  to  make  up  for  the  stoppage  by- 
faster  work,  is  paralleled  by  the  haste  often  exhibited  at  the 
end  of  a  working  period  in  order  to  finish  off  a  given  operation 
or  complete  a  given  task.  All  these  are  cases  where  the 
stimulus  is  inconstant  owing  to  variable  moods,  and  the  dis- 
turbing factor  can  be  exercised  by  averaging  out. 

On  the  other  hand,  a  very  striking  instance  of  the  stimu- 
lus being  regularly  inconstant  owing  to  deUberate  calcula- 
tion was  discovered  at  a  large  English  munition  factory.  A 
certain  definite  amount  had  apparently  become  the  tradi- 
tional day's  output.  If  the  worker  approached  this  output 
earlier  in  the  day  than  was  usual  he  would  usually  slow  down 
deliberately  to  avoid  "exceeding  the  limit."  To  detect  such 
limitation  of  output  that  is  not  necessarily  due  to  diminished 
working  capacity,  the  investigator  should  look  back  over  the 
records.  The  stereotyped  repetition  of  exactly  the  same 
number  of  units  of  output  by  one  worker  after  another,  week 
after  week,  is  highly  suspicious. 

In  certain  cases  the  incentive  to  work  varies  owing  to  the 
stress  of  economic  circumstances  upon  the  business  pursued. 
Work  in  offices,  for  instance,  is  subject  to  special  rush  hours 
during  the  day  when  the  mail  must  be  dispatched.  Such 
diverse  industries  as  laundries  and  telephone  exchanges  are 
subject  also  to  rush  days  during  the  week,  or  rush  hours 
during  the  day,  when  the  demands  of  their  customers  are 
heaviest. 

During  these  times  the  factory  or  office  management  will 
incite  its  staff  to  special  efforts  and  any  slackening  will  lead 
more  readily  to  dismissal  than  at  other  times.  As  a  result, 
output  will  rise  during  the  rush.  In  the  office  of  a  munition 
factory,  for  instance,  a  typist  working  from  a  dictaphone  was 
found  to  average  anything  from  2.16  to  3.83  lines  a  minute 
from  5  to  5 :  45  p.m.,  when  dictaphone  records   had  to  be 


COLLECTION   OF  STATISTICAL   DATA  135 

immediately  transcribed  into  letters,  but  her  average  at 
other  times  was  about  two  lines.  This  did  not  mean  that 
her  working  capacity  was  greater  at  5  in  the  evening  but 
probably  that  the  same  capacity  was  stimulated  to  greater 
efforts. 

The  constant  desire  to  earn  high  wages  can  be  relied  upon 
as  an  incentive  to  work  to  full  capacity,  and  an  incentive 
strong  enough  to  overcome  all  the  other  various  motives, 
only  when  such  wages  are  paid  on  a  piece  basis ;  that  is  to 
say,  when  the  amount  of  earnings  depends  on  the  amount 
of  work  done.  Investigators  are  strongly  advised  not  to  make 
records  of  outputs  under  a  time-wage  system  or  even  under 
a  piece-wage  system  that  is  strongly  digressive  [sic]  (where 
the  greater  the  output  the  less  in  proportion  is  paid  in 
wages)  unless  discipline  and  the  fear  of  losing  employment 
and  all  wages  are  unusually  potent. 

Above  all,  output  produced  under  different  scales  of  wages 
should  never  be  compared.  Overtime  work,  for  instance, 
is  often  paid  at  one  and  a  quarter  or  one  and  a  half  times  the 
piece  rate  paid  for  work  during  the  normal  working  day  and 
extra  work  on  Sundays  is  often  paid  double.  As  a  result 
workers  will  tend  to  "go  easy"  in  ordinary  hours  or  on  week- 
days and  reserve  their  strength  for  the  overtime  and  the 
Sunday  work.  Output  will  vary  accordingly  but  it  will 
furnish  no  clear  indication  of  working  capacity. 

A  similar  variation  in  what  is  after  all  the  main  incentive 
in  modern  industry,  namely  the  "economic"  motive,  may 
sometimes  be  found  owing  to  the  maladjustment  of  different 
wage  systems.  In  a  small  munition  factory  near  London, 
though  piece  wages  were  nominally  being  paid  both  on  an 
eight-hour  and  a  twelve-hour  shift,  girls  worldng  the  short 
shift  were  in  certain  processes  being  remunerated  in  fact  only 
by  a  time-wage,  since   they  knew,  or  thought   they  knew. 


136  STATISTICAL   METHODS 

beforehand  that  they  could  not  produce  enough  output  in 
the  shorter  hours  to  earn  more  than  the  minimum  hourly 
time-wage  which  was  guaranteed  them  by  a  trade-union 
agreement.  On  the  long  shift,  therefore,  girls  were  Ukely 
to  be  "trying"  much  harder  than  on  the  short  shift. 

When  the  main  incentive  is  not  a  constant  force  output  data 
are  rendered  useless.  The  degree  of  inconstancy  cannot  he 
measured  accurately  and  the  investigator  is  warned  never  to 
choose  records  under  such  conditions. 

D.  Constancy  in  Feasibilitp :  To  measure  worldng  capacity 
unambiguously,  variations  in  output  must  obviously  not  be 
due  to  variations  in  such  foreign  circumstances  as  the  quality 
of  the  materials  and  of  the  machines  used  in  the  work  or  to 
the  quality  of  the  lighting. 

Lighting,  besides  influencing  output  indirectly  through  its 
influence  on  working  capacity,  particularly  that  of  the  eyes, 
may  affect  the  ease  of  operation  directly  and  physicallj^  by 
its  influence  on  the  visibility  of  the  material  equipment.  The 
Industrial  Commission  of  Wisconsin  found  that  a  certain 
steel  plant  by  merely  changing  its  system  of  lighting  increased 
its  output  at  night  by  over  10  per  cent,  and  undoubtedly  any 
excess  of  output  by  day  over  that  at  night  is  in  part  attribut- 
able to  the  greater  power  and  more  equal  distribution  of  day- 
light. In  certain  processes,  however,  artificial  light  can  more 
easily  be  centered  on  the  work  and  glare  can  be  avoided. 

The  same  amount  of  a  given  kind  of  output  if  produced 
from  different  machines  may  have  involved  quite  a  different 
ease  of  production ;  and  even  similar  machines  will  vary 
substantially  in  ease  of  production  according  as  they  are 
oiled,  connected  with  the  power,  etc.  The  investigator  should 
hesitate,  therefore,  before  classing  as  identical  even  similarly 
named  and  similar-looking  machines.  The  sUghtest  differ- 
ence, when  the  machine  is  at  work,  in  the  methods  of  driving, 


COLLECTION   OF  STATISTICAL   DATA  137 

feeding,  and  controlling  it,  and  guiding  the  material  will  pro- 
duce vast  differences  in  the  feasibility  of  a  given  operation. 

Raw  material,  even  of  exactly  the  same  name,  when  drawn 
from  different  parts  of  the  globe  is  likely  to  differ  greatly  in 
the  ease  with  which  it  can  be  handled  —  in  its  softness,  malle- 
ability, pliability,  etc.  Again,  it  is  well  known  that  cotton 
thread  while  being  spun  breaks  less  easily  in  a  humid  than 
in  a  dry  atmosphere. 

The  quality  of  the  raw  material  supplied  may  vary  also 
according  to  the  skill  of  the  operator  who  prepared  it.  Thus 
in  cotton-spinning,  the  number  of  threads  that  break  on  the 
slobbing  frame  depends  largely  on  the  skill  displayed  in  the 
drawing  processes  that  just  precede  the  slobbing. 

Because  of  the  enormous  differences  in  feasibility  of  any 
given  output  due  merely  to  differences  in  factory  equipment 
and  technique  —  lubrication,  lighting,  materials,  machines, 
and  also  to  factory  organization  —  it  is  inadvisable  for  any 
investigator  to  attempt  to  compare  the  worldng  capacity  in  one 
factory  directly  with  that  in  another. 

Sources  of  Record 

The  method  of  collecting  output  data  that  is  most  likely 
to  be  accurate  is  for  the  investigator  himself  to  watch  a 
group  of  workers  and  note  their  output,  staying  in  the  factory 
day  in  and  day  out,  and  this  method  has  the  advantage  of 
continually  suggesting  to  the  investigator  new  facts  of  signifi- 
cance and  new  methods  of  recording  them.  For  instance, 
as  I  watched  the  output  of  four  girls  assembling  bicycle  chains 
with  a  press  driven  by  foot,  for  two  days  of  eleven  hours  each, 
I  observed  clandestine  meals  and  rests  taken  unofficially  and 
how  the  rests  were  spent.  Further,  struck  by  the  constant 
rhythm  of  the  girls'  motions,  I  was  led  to  some  new  investi- 
gations into  the  value  of  rhythm  as  a  stimulus. 


138  STATISTICAL   METHODS 

However,  the  personal  collection  of  sufficient  output 
data  to  establish  conclusions  would  require  a  whole  army  of 
investigators,  and  even  then  the  presence  of  the  investigator 
is  only  too  likely  to  disturb  and  make  unrepresentative  the 
very  facts  he  wishes  to  secure  in  their  native  state,  as  actuali- 
ties of  industrial  life.  Indeed  I  found  that  the  average  out- 
put of  the  four  chain-assemblers  was  at  a  speed  considerably 
higher  than  usual  on  the  days  I  watched,  being  7.10  chains 
per  hour  as  against  from  5.85  to  6.80  recorded  in  the  books 
for  previous  weeks.  In  spite  of  a  tactful  explanation  of  my 
purely  scientific  purpose,  the  presence  of  a  stranger  making 
strange  notes  may  have  inspired  a  fear  of  taking  very  long 
rest  pauses  or  of  indulging  too  much  in  conversation.  Where, 
however,  as  in  ''scientifically  managed"  factories,  the  workers 
are  accustomed  to  being  time-studied,  the  disturbance  due 
to  this  factor  will  be  much  smaller. 

The  method  of  recording  output  which  is  least  disturbing 
to  the  worker's  ordinary  attitude  and  also  most  easily  carried 
out  is  by  use  of  automatic  registers,  of  which  the  cyclometer 
is  perhaps  the  most  familiar  type.  I  have  seen  clocks  or 
registers  attached  to  machines  such  as  looms,  stamping- 
presses,  sewing-machines,  where  each  revolution  of  the  crank 
producing  a  unit  of  output  was  duly  recorded  in  figures  which 
could  be  read  off  whenever  required.  Some  registers  are 
even  self-recording;  that  is  to  say,  instead  of  being  "read 
off"  by  human  agency  they  actuate  a  pen  which  traces  the 
curve  of  output  on  a  rotating  drum.  In  view  of  the  low  cost 
of  registers  and  the  ease  with  which  they  are  attached,  their 
use  might  well  be  extended. 

A  method  of  recording  only  slightly  more  disturbing  is  for 
a  member  of  the  factory  staff,  usually  the  foreman,  personally 
to  make  the  record.  To  the  worker  the  presence  of  the  fore- 
man and  his  taking  of  notes  are  a  part  of  the  factory  routine ; 


COLLECTION   OF  STATISTICAL   DATA  139 

the  worker's  attitude  will  not  alter  much  from  that  of  his 
ordinary  working  mood. 

Such  records  either  personally  or  automatically  collected 
by  the  firm  may  either  be  initiated  by  the  investigator  or 
may  already  be  in  existence  when  he  begins  investigating. 
As  the  investigator  enters  the  factory  for  the  first  time,  some- 
what bewildered  perhaps,  he  should  ask  that  the  output  rec- 
ords already  collected  be  shown  him.  Never  should  he 
lose  an  opportunity  of  using  the  documents  of  industry  that 
he  finds  ready  to  hand.  He  cannot  l^e  urged  too  strongly, 
however,  always  to  subject  these  factory  records  to  a  de- 
tailed scrutin3^  First  of  all,  he  should  visit  the  actual  opera- 
tion in  the  workshop  of  which  the  record  shows  the  output. 
This  personal  visit,  especially  if  the  investigator  has  even  a 
small  knowledge  of  mechanics,  will  probably  suggest  expla- 
nations of  peculiarities  in  the  records  or  perhaps  show  up 
errors  in  recording.  Secondly,  the  output  record  itself  should 
be  carefully  checked  and  the  same  questions  put  as  though 
the  investigator  was  selecting  operations  to  study  for  him- 
self. Was  the  output  enumerable?  Was  it  expressive? 
Was  it  free  from  ambiguitj'-,  with  personnel,  preparedness 
incentive,  and  feasibility  either  constant  or  averaged  out? 
Records  when  kept  in  the  factor}^  books  as  a  matter  of  rou- 
tine often  range  over  a  long  period  and  cover  a  large  number. 
As  mass  statistics,  therefore,  they  will  offer  a  great  chance 
of  averaging  out  inconstant  factors  and,  even  when  not  en- 
tirely free  from  ambiguity,  they  may  often  prove  useful  in 
checking  intensive  inquiries.  I  have  used  figures  of  gross 
output  per  machine,  irrespective  of  possible  absences  of  the 
workers,  in  a  whole  department  making  millions  of  rifle 
cartridge-cases  per  week,  to  check  a  comparison  of  night  and 
day  efficiencies  based  on  the  weekly  output  records  of  selected 
individuals.     It  was  very  much  against  the  interest  of  the 


140  STATISTICAL   METHODS 

firm  to  have  any  machines  lying  idle,  so  that  absences,  or  at 
any  rate  absences  without  substitution  of  another  worker, 
were  extremely  rare. 

Seldom,  of  course,  will  these  records  of  output  have  been 
made  by  the  firm  for  the  purpose  of  studying  working  capac- 
ity; when  they  are  taken  it  usually  is  for  the  purpose  of 
computing  the  piece-wages  to  be  paid  their  workers. 

In  one  munition  factory  where  workers  are  paid  so  much 
per  thousand  rifle  cartridges  turned  out,  with  a  minimum 
guaranteed  wage  of  so  much  per  hour,  the  hours  worked  and 
the  output  on  each  day  are  noted  down  quite  simply  for  each 
individual  in  small  memorandum  books  kept  by  the  foreman, 
and  hours  and  output  are  added  up  for  the  week. 

In  another  and  larger  firm,  where  the  wage  paid  is  based 
on  a  more  complicated  system  and  where  the  output  is  more 
varied,  a  huge  "detail  sheet"  is  kept  at  the  "wages  office" 
and  filled  in  for  each  individual  worker  each  week,  being 
arranged  as  follows  :  columns  are  provided  for  the  time  at 
which  the  employee  entered  and  left  the  works ;  for  the  time 
lost  and  the  time  worked  for  each  day  of  the  week.  Each 
of  the  different  kinds  of  operation  the  employee  has  performed 
is  then  entered  item  by  item  down  a  column ;  and  opposite 
each  entry  is  stated  the  hours  worked  on  that  operation  and 
the  output,  both  hours  and  outpiit  appearing  under  the  proper 
day.  Beyond  the  columns  for  each  day  are  columns  for  the 
hours  worked  and  the  output  of  each  operation  for  the  whole 
week. 

These  columns  contain  the  whole  of  the  information  on 
the  facts  of  output  rates  that  we  require ;  columns  beyond 
them  work  out  the  wages  payable  for  the  week  from  the  facts 
already  given. 


COLLECTION   OF  STATISTICAL   DATA  141 

REVIEW 

L  What  is  the  denominator ;  the  numerator;  in  the  coefficient 
"  Output  Rate  "?  What  measures  m-Ay  be  used  to  determine  it,  or 
tj  reduce  it  to  a  common  denominator?  What  are  the  limitations 
of  each? 

2.  If  the  aim  is  to  measure  statistically  factory  output,  what  con- 
ditions may  occur  respecting 

(1)  The  type  of  worker. 

(2)  The  preparedness  for  work. 

(3)  The  stimulus  to  work. 

(4)  The  feasibility. 

Which  will  make  the  result  ambiguous  or  "indeterminate"? 
JNIake  a  list  of  the  things  under  each  heading  as  given  in  the  text, 
and  add  others  from  your  own  experience. 

3.  Who  should  take  the  record  of  output?  Why?  What  tests 
should  be  applied  to  determine  the  use  and  value  of  records?  Make 
a  list  of  them  and  compare  them  with  those  given  in  the  Text. 

4.  Is  the  above  discussion,  in  relation  to  methods  and  safeguards 
in  collecting  statistical  data,  of  universal  application?  If  so,  show 
how  they  apply  in  such  problems  as 

(1)  Studying  wage  data  as  a  basis  for  an  arbitration  pro- 

ceeding. 

(2)  Studying  accidents  as  a  basis  for  introducing  safety  de- 

vices. 

(3)  Analyzing  sales  as  a  basis  for  an  advertising  campaign. 

What's  in  a  Name  —  The  Cause  of  Death  ^ 

Error  in  the  Official  Record  of  Deaths  from  Tuberculosis.  — 
There  can  be  no  doubt  that  the  tuberculosis  rate  was  dimin- 
ished by  inaccurate  statement  of  the  cause  of  death  on  the 
official  certificate.  In  a  large  number  of  cases  the  cause  of 
death  certified  to  by  the  physician  was  contradicted  by  the 
historj^  of  the  decedent's  illness  as  reported  'by  relatives. 

^  Adapted  with  permission  from  "  Errors  in  Death  Registration  in  the 
Industrial  Population  of  Fall  River,  Mass.,"  in  Monthly  Review.  United 
States  Bureau  of  Labor  Statistics,  Vol.  5,  No.  1,  July,  1917,  pp.  2-8. 


142  STATISTICAL   METHODS 

Thus,  in  cases  in  which  the  physician's  certificate  gave  some 
such  equivocal  cause  of  death  as  bronchitis  or  hemorrhage, 
or  some  terminal  conditions,  such  as  broncho-pneumonia 
or  heart  failure  or  debihty,  relatives  of  the  decedent  testified 
that  for  possibly  a  year  or  more  before  death  the  decedent 
had  had  a  bad  cough,  had  expectorated  profusely,  had  be- 
come extremely  emaciated,  had  suffered  from  night  sweats, 
had  had  one  or  several  hemorrhages  of  bright  blood,  and  was 
the  second  or  third  in  the  family  who  had  "died  of  consump- 
tion" within  the  last  few  years,  or  had  parents  one  or  both  of 
whom  had  died  long  ago  after  years  of  such  tuberculous  mani- 
festations. Such  testimony  as  to  matters  of  simple  fact 
seems  entitled  to  considerable  credence. 

A  French-Canadian  woman,  aged  23  years,  .  .  .  for  7  years 
a  spinner  until  she  left  the  mill  because  of  cough  two  years 
before  death,  was  certified  by  her  attending  physician  (now 
dead)  as  having  died  from  ''bronchitis."  Another  attend- 
ing physician  whose  name  is  upon  death  certificates  of  two 
other  family  members  did  not  "recall"  this  case.  The  seem- 
ingly tuberculous  mother  and  brother  of  decedent  affirmed 
that  the  latter  had  died  from  tuberculosis,  "just  as  her  father 
and  three  sisters  did."  These  last  mentioned  four  are  cer- 
tified as  having  died  of  tuberculosis  between  March,  1910, 
and  August,  1912,  and  are  so  recorded  in  this  study.  An- 
other sister  was  recommended  to  a  tuberculosis  hospital 
October,  1909,  and  is  said  to  have  recovered.  This  case  was 
scheduled  as  nontuberculous.  .  .  . 

A  special  canvass  was  made  to  see  just  how  commonly 
tuberculosis  was  misreported  on  the  official  death  certificate. 
There  were  188  causes  in  which  there  was  marked  discrepancy 
between  the  cause  of  death  as  given  on  the  death  certificate, 
and  the  cause  of  death  suggested  by  the  history  of  the  dece- 
dent's illness  as  given  by  the  family.     Every  physician  who 


COLLECTION   OF  STATISTICAL   DATA  143 

had  signed  one  of  these  188  certificates,  if  still  living  and  still 
in  Fall  River,  was  visited  and  questioned  about  the  death. 
By  this  process  the  probable  correctness  of  the  certified  cause 
was  satisfactorily  established  concerning  31  of  these  cases. 

In  65  of  the  remaining  157  cases  no  further  information 
was  obtainable,  because  the  certifying  physician  had  either 
died  or  left  Fall  River,  or  else  professed  inability  to  remember 
and  no  other  attending  physician  could  be  found.  In  not  a 
few  of  these  65  cases  the  histories  indicated  overwhelmingly 
that  these  deaths  were  due  to  tuberculosis.  Nevertheless, 
the  certificates  were  taken  as  correct  unless  an  admission  was 
secured  from  the  certifying  physician  that  the  recorded  cause 
of  death  was  incorrect.  Consequently  these  65  cases  have 
been  counted  as  correctly  certified. 

The  remaining  92  cases  are  either  admittedly  or  demon- 
strably cases  of  tuberculous  deaths.  .  .  .  These  92  cases 
may  be  divided  into  the  following  classes : 

1.  Those  in  which  the  certifying  physician  unequivocally 
stated  the  cause  of  death  to  be  tuberculosis.  These  numbered 
70. 

2.  Those  unequivocally  vouched  for  as  tuberculous  by 
a  physician  who  had  attended  the  decedent  in  his  last  illness 
but  had  not  signed  the  death  certificate.  Recourse  was  had 
to  these  other  physicians  only  because  in  every  one  of  these 
cases  the  physician  who  had  signed  the  certificate  had  either 
died,  left  Fall  River,  or  forgotten  all  about  the  case.  This 
forgetfulness  is  explained  by  the  fact  that  the  signers  of  the 
certificate  were  sometimes  city  physicians,  who  had  responded 
to  an  emergency  call  and  possibly  had  seen  the  decedent 
professionally  only  once.     These  cases  numbered  12. 

3.  Those  who,  after  a  sputum  examination,  had  been  re- 
corded on  city  or  hospital  records  as  tuberculous.  Of  these 
there  v/ere  five. 


144  STATISTICAL   METHODS 

4.  Those  stated  by  the  certifying  physician  to  have  been 
"tuberculous  probably."  Two  of  these  had  not  been  certi- 
fied as  tuberculous,  because  no  bacteriological  examinations 
of  the  sputum  had  been  made,  "and  so,"  said  the  physician 
concerning  one  of  these,  "though  I  knew  the  case  was  tuber- 
culosis I  couldn't  actually  swear  it  was."  This  group  Uke- 
wise  numbered  five. 

As  a  result  of  this  special  canvass,  it  appears  that  not  im- 
probably one-sixth  (17  per  cent)  of  all  the  fatal  tuberculosis 
in  the  city  was  misreported  under  nontuberculous  diagnoses. 

Reasons  for  Erroneous  Certifications 
OF  Death 

The  question  of  course  arises  why  the  true  cause  should  be 
so  often  ignored  or  misleadingly  reported  on  the  death  cer- 
tificate. There  seem  to  be  several  reasons  for  this.  Some 
persons  are  sensitive  as  to  the  existence  of  a  case  of  tubercu- 
losis in  their  family  and  would  seriously  object  to  having 
such  a  cause  recorded  upon  a  certificate.  The  knowledge 
that  this  feeling  is  common  may  affect  the  physician  even  in 
cases  where  no  such  prejudice  exists.  But  apparently  by  far 
the  most  effective  reason  is  the  attitude  of  some  of  the  in- 
surance companies  which  may  delay  payment  of  policies  of 
decedents  officially  certified  as  having  died  from  tuberculosis 
and  which  also  not  uncommonly  refuse  to  insure  other  mem- 
bers of  the  family  of  such  a  decedent.  Physicians  when  asked 
about  these  variant  cases  occasionally  admitted  that  the 
certificates  were  designedly  misleading,  but  justified  them 
on  the  ground  of  personal  financial  expediency  arising  from 
intense  medical  competition,  and  on  the  added  ground  that 
sometimes  only  through  such  registration  practices  could 
the  decedent's  family  secure  promptly  the  amount  they  were 
entitled  to  from  the  insurance  companies. 


COLLECTION   OF  STATISTICAL   DATA  145 

Error  in  Official  Record  of  Decedent's  Occupation.  —  In 
addition  to  the  errors  concerning  the  causes  of  death,  whether 
principal  or  contributory,  the  records  were  found  to  be  seri- 
ously inaccurate  in  their  statements  concerning  the  decedent's 
occupation.  Fortunately  it  was  possible  to  correct  these 
errors  to  a  very  considerable  degree,  far  more  so  than  to  cor- 
rect errors  in  the  alleged  causes  of  death.  As  stated  above, 
the  physician's  official  statement  as  to  the  cause  of  death 
was  accepted  unless  the  original  certification  was  admittedly 
or  evidently  wrong.  This  policy  was  followed  no  matter 
how  seriously  the  correctness  of  the  certificate  was  doubted. 
But  a  similar  adherence  to  the  record  was  not  considered 
necessary  in  regard  to  the  statement  of  the  decedent's  occupa- 
tion, this  being  a  matter  on  which  the  physician's  profes- 
sional training  would  have  no  bearing,  and  of  which  neither 
he  nor  the  hurried  and  sometimes  careless  undertaker  proba- 
bly had  personal  knowledge.  Wlien,  therefore,  a  statement 
by  relatives  or  friends  as  to  the  occupation  of  a  given  dece- 
dent differed  from  that  of  the  death  certificate,  the  former 
was  taken  as  authoritative. 

The  errors  of  the  death  certificates  as  to  occupation  were 
both  of  omission  and  commission.  Persons  who  were  really 
cotton-mill  operatives  were  not  so  recorded.  Others  were 
set  down  as  operatives  who  had  never  worked  in  a  cotton  mill 
or  who  had  not  done  so  for  more  than  two  years  preceding 
death. ^  The  former  error  was  the  more  common  among 
female  and  the  latter  among  male  decedents. 

The  extent  of  these  errors  as  accurately  determined  in 
Fall  River  for  the  whole  eight-year  period  —  1905  to  1912  — 
shows  most  conclusively  the  seriousness  of  the  misapprehen- 

'  A  considerable  part  of  this  error  is  due  to  the  vague  use  of  the  term 
"  operative,"  which  is  frequently  employed  on  death  certificates  with  nothing 
to  show  whether  the  person  concerned  worked  in  cotton  or  woolen  mills,  in 
dye  works,  bleacheries,  or  printeries,  or  in  piano  or  hat  factories. 
L 


146  STATISTICAL   METHODS 

sion  which  would  be  caused  by  using  the  official  certificates 
without  investigation  of  their  accuracy. 

For  the  eight-year  period  nearly  one-half  (49  per  cent)  of 
the  female  decedents  who  were  found  to  have  been  cotton- 
mill  operatives  were  not  so  recorded.  On  the  other  hand 
one-eighth  (13  per  cent)  of  the  females  recorded  as  operatives 
were  found  on  investigation  not  to  have  been  cotton-mill 
operatives.  Among  the  males  for  the  same  period,  23  per 
cent  of  those  who  were  finallj^  classed  as  cotton  operatives 
were  recorded  on  the  death  certificates  as  following  some  other 
occupation,  while  one-fourth  of  those  recorded  as  operatives 
could  not  properly  be  included  among  cotton-mill  workers. 

The  recorded  number  of  male  operative  decedents  in  Fall 
River  for  the  eight-year  period  (1905-1912)  was  915.  Of 
these  233,  or  25  per  cent,  were  found  not  to  have  been  cotton- 
mill  operatives,  while  207,  who  on  their  death  certificates 
were  assigned  to  other  occupations,  were  really  cotton-mill 
operatives  at  the  time  of  their  death.  The  real  number  of 
male  operative  decedents,  therefore,  was  889,  the  group  as 
recorded  ha\ang  been  larger  b}-  26  than  the  facts  justified. 

On  the  other  hand,  the  recorded  number  of  female  opera- 
tive decedents  in  Fall  River  for  the  eight-year  period  was 
548.  Of  these  71,  or  13  per  cent,  were  found  not  to  have 
been  cotton-mill  operatives,  while  459,  who  were  recorded 
either  as  having  other  occupations  or  no  occupation  at  all, 
proved  on  investigation  to  have  been  really  cotton-mill  opera- 
tives.    This  gives  a  total  of  936  decedent  female  operatives. 

Conclusions 

There  is  no  reason  to  suppose  that  the  official  registration 
of  deaths  is  more  carelessly  or  recklessly  performed  in  Fall 
River  than  elsewhere ;  indeed,  in  view  of  the  advanced  posi- 


COLLECTION   OF  STATISTICAL   DATA  147 

tion  which  Massachusetts  has  taken  in  regard  to  vital  statis- 
tics there  are  grounds  for  the  opposite  supposition.  It  is  be- 
lieved, therefore,  that  the  facts  disclosed  in  this  summary 
show : 

(1)  That  there  is  urgent  need  for  a  closer  supervision  of 
death  registration,  and  for  a  sustained  effort  to  secure  greater 
accuracy  and  a  nearer  approach  to  completeness  in  the  cer- 
tificates filed. 

(2)  That  a  small  minority  of  the  physicians  of  a  city  or 
State  are  able  most  seriously  to  retard  progress  in  industrial 
hygiene  and  preventive  medicine,  through  their  failure,  ad- 
mittedly sometimes  intentional,  to  give  intelligent  compli- 
ance with  the  death  registration  requirements  of  the  law. 

(3)  That  under  present  conditions  death  certificates  need 
careful  verification  before  any  but  the  most  general  con- 
clusions respecting  early  dc^ath  in  industry  may  be  safely 
drawn  from  them.  In  particular,  deductions  as  to  the  prev- 
alence, the  increase,  or  the  decrease  within  any  specified  age 
group  of  fatalities  from  causes  like  tuberculosis  ...  or  as  to 
the  effect  of  a  given  occupation  upon  those  of  a  designated 
age  who  follow  it,  are  liable  to  be  wide  of  the  truth  if  based 
upon  the  death  data  of  the  registrar's  office,  unless  such  data 
are  first  subjected  to  detailed  investigation.  ... 

REVIEW 

1.  Is  the  inaccuracy  in  the  cause  of  death  due  to  reporting  or 
to  the  determination  of  the  cause? 

2.  What  were  the  causes  for  the  errors  in  the  occupations? 
What  is  an  "operative"?  Formulate  a  definition  which  can  be 
statistically  used. 

3.  Put  into  a  brief  statistical  table  the  numerical  facts  contained 
in  the  last  two  paragraphs  previous  to  the  Conclusion.  Does  the 
tabular  form  help  to  make  the  figures  "stand  out"? 


148  STATISTICAL   METHODS 

Statistical  Standards  in  the  Collection  of 

Facts  ^ 

First,  facts  must  be  collected  for  a  definite  purpose.  Sta- 
tistical analysis  cannot  proceed  as  if  it  were  in  a  vacuum ;  the 
meaning  of  a  statistical  fact  is  a  function  of  the  use  to  which 
it  is  put,  and  the  costs  of  collection  are  justified  only  in  the 
realization  of  a  purpose.  For  collection  to  proceed  without 
a  definite  goal  in  mind  is  not  only  wasteful  of  time  and  money, 
but  fatal  to  the  idea  of  statistical  analysis.  Facts  are  not 
equally  good  for  all  purposes.  The  acts  of  measurement 
and  of  classification  presuppose  a  purpose.  Fruitless  in- 
vestigations carried  on  at  enormous  costs  and  resulting  in 
ill-will  on  the  part  of  those  who  are  interested  in  the  results, 
discouragement  on  the  part  of  those  who  are  undertaking 
them,  and  a  tendency  to  scout  the  idea  of  statistical  analysis 
and  the  function  of  experts,  are  largely  if  not  solely  trace- 
able to  a  violation,  of  this  seemingly  self-evident  truth. 

Second,  facts  must  be  collected  in  standardized  units  and 
under  uniform  methods  of  application. 

Third,  a  sufficient  sanction  for  the  collection  or  use  of  data 
must  be  secured.  To  formulate  a  definite  purpose  for  which 
facts  are  wanted  is  the  first  condition  for  securing  this.  It 
is  generally,  but  not  always,  necessary  to  demonstrate  that 
personal  advantage  will  result  from  a  study  of  the  facts  fur- 
nished. But  often  more  than  this  narrow  appeal  may  be 
made.     Interest  in  fundamental  principles  may  be  aroused. 

Fourth,  standards  of  collection  require  that  the  full  import 
of  such  questions  as  the  following  shall  be  considered.  (1)  For 
what  periods,  under  what  conditions,  and  for  what  places  are 
the  facts  available  ?  Are  the  purposes  and  methods  of  analysis 

1  Adapted  from  Secrist,  Horace,  "Statistical  Standards  in  Business 
Research,"  Quarterly  Publications  of  the  American  Statistical  Association, 
March,  1920,  pp.  51-53. 


COLLECTION   OP   STATISTICAL   DATA  149 

conditioned  by  the  answers  secured?  (2)  Will  available 
facts  be  given  or  may  tiiey  be  assembled ;  and  if  so  in  what 
units,  with  what  degree  of  accuracy,  and  with  what  effect? 
(3)  Do  the  schedules  or  forms  used  in  collection  provide  for 
keeping  confidential  the  data  supplied?  (4)  Are  the  units 
of  measurement  which  are  employed  standardized  and  under- 
stood ?  Do  they  follow  or  run  counterwise  to  the  terminology 
of  the  records  employed?  How  may  neccssarj^  adjustments 
be  made  and  with  what  effect? 

Fifth,  statistical  standards  require  that  wherever  possible 
the  truth  or  error  of  facts  shall  be  verified.  Against  the  im- 
putation of  gullibility,  those  in  charge  of  statistical  analysis 
should  always  be  capable  of  defending  themselves.  To  take 
on  faith  the  plausible  or  to  discard  seeming  exceptions  is  not 
in  keeping  with  scientific  method.  Verification  requires  more 
than  testing  mechanical  accuracy  and  removing  apparent  in- 
consistencies. It  involves  an  analj^sis  of  the  composition 
of  groups  and  totals,  and  a  scrutiny  of  the  uniformity  of 
measurement  and  the  methods  in  which  units  are  applied  for 
different  times,  places,  and  conditions. 

Sixth,  the  field  from  which  data  are  secured  must  be  ade- 
quate and  the  facts  inclusive  or  representative.  The  choice 
of  the  field  and  the  selection  of  the  facts  depend  upon  the 
purpose  for  which  analysis  is  undertaken.  A  problem  re- 
quiring inclusive  data  must  be  approached  differently  from 
one  which  may  be  studied  by  means  of  samples.  Standards 
of  collection,  may,  indeed,  become  standards  of  ehmination; 
and  balance  and  consistency  rather  than  simple  verification 
of  accuracy  become  the  goal. 


CHAPTER   III 
UNITS  OF  MEASUREMENTS   IN  STATISTICAL  STUDIES 

The  Nature  and  Conditions  of  Statistical 

Measurement  ^ 

It  is  very  seldom  that  the  unit,  which  is  actually  used  in 
compilation,  is  that  which  the  unwary  would  imagine  from 
a  carelessly  quoted  summary,  or  that  which  a  priori  an  in- 
vestigator would  desire.  What  constitutes  a  pauper  ?  What 
entitles  a  man  to  be  included  in  the  Labor  Department's 
monthly  total  of  unemployed?  What  relation  have  the 
totals  of  paupers  or  unemployed  officially  stated  to  the  poor 
or  unfortunate  as  to  whose  condition  and  numbers  we  desire 
knowledge?  What  does  the  Labor  Department  understand 
by  an  increase  of  wages  ?  What  is  income,  and  what  relation 
has  it  to  the  total  income  published  by  the  Inland  Revenue 
Commissioners,  determined  by  numerous  Acts  of  Parliament 
and  limited  by  judicial  decisions  ?  Under  what  circumstances 
are  married  persons  returned  as  unmarried,  or  vice  versa,  for 
census  purposes?  What  is  a  room  and  what  a  tenement? 
How  is  the  value  of  wool  and  how  the  value  of  machinery  or 
of  pictures  determined  for  the  trade  accounts?  Does  the 
total  value  generally  quoted  for  our  foreign  trade  include 
bulUon,  ships'  stores,  ships'  coal,  ships  themselves,  foreign 
produce  bought  and  re-sold,  or  transhipped  in  bond,  and  given 

1  Adapted  with  permission  from  Bowley,  A.  L.,  "The  Improvement  of 
Official  Statistics,"  in  the  Journal  of  the  Royal  Statistical  Society,  September, 
1908,  Vol.  71,  pp.  463-469. 

150 


UNITS  OF  MEASUREMENTS  151 

the  answers  to  these  questions,  how  far  must  they  be  modified 
for  other  countries?  The  greatest  of  the  difficulties  in  com- 
paring the  statistics  of  two  countries  is  in  obtaining  adequate 
definitions  of  the  units.  The  definition  is  a  matter  of  con- 
ventional and  very  elaborate  delimitation,  sometimes  ar- 
bitrary, sometimes  dependent  on  law  or  on  custom;  and 
till  the  principles  of  the  delimitation  in  each  special  case  are 
known  the  statistics  resulting  camiot  be  used  with  any  cer- 
tainty. 

Homogeneity.  —  It  is  frequently  the  case  that  the  most 
distinctive  attribute  of  the  unit  is  variable  in  degree  or  not 
capable  of  exact  definition,  or  that  for  other  reasons  the  units, 
similar  in  the  attributes  selected,  are  dissimilar  in  other  equally 
important  attributes.  Let  us  consider  the  contents  of  some 
well-known  totals.  The  number  of  adult  male  wage  earners 
in  Great  Britain  and  in  other  countries  can  be  estimated,  ])ut 
the  relation  of  these  totals  tells  very  little  about  the  labor 
power  of  the  nation,  for  the  men  in  one  country  vary  greatly 
in  skill  and  energy,  and  the  range  of  skill  and  energy  differs 
from  country  to  country.  The  amount  of  wages  received 
and  work  done  vary  from  man  to  man,  and  the  totals  are 
composed  of  units  which  are  heterogeneous  for  all  practical 
purposes.  If  we  aim  at  greater  homogeneity  by  counting 
only  the  skilled  men,  we  find  that  ''skilled"  is  a  term  not 
capable  of  exact  definition.  Again,  if  we  take  Mr.  Booth's 
or  Mr.  Rowntree's  estimates  of  the  number  of  persons  below 
a  fixed  standard  of  livelihood,  we  find  at  once  that  the  dis- 
tance they  fall  below  that  standard  varies  greatly,  that  the 
pressure  of  poverty  depends  on  many  moral  qualities  and 
accidents  of  situation,  and  is  not  simply  a  function  of  deficit 
of  income,  and  consequently  that  these  totals  are  not  homo- 
geneous in  the  connection  for  which  they  are  generally  used. 
Or  if  we  consider  the  total  value  of  the  exports  from  the 


152  STATISTICAL    METHODS 

United  Kingdom,  we  find  that  items  included  are  heteroge- 
neous for  all  purposes  except  the  balance  of  trade.  £1000 
worth  of  exports  of  any  kind  gives  rise  to  a  bill  of  exchange 
that  will  purchase  a  corresponding  amount  of  foreign  goods ; 
but  as  regards  the  employment  of  home  capital  and  home 
labor  there  is  every  possible  variety,  from  the  export  of 
coal,  entirely  a  home  product,  to  the  export  of  foreign  goods " 
which  become  entitled  to  be  called  British  by  the  process  of 
repacking,  and  further  the  relative  shares  of  capital  and 
labor  in  the  production  vary  indefinitely.  If  we  are  con- 
sidering the  profits  to  be  made  by  a  foreign  trade,  as  com- 
pared with  home  trade,  for  instance,  the  root  of  the  protec- 
tionist controversy,  we  find  that  there  is  no  necessary  rela- 
tion between  the  total  trade  and  the  amount  of  profit,  and 
that  the  various  parts  of  the  export  trade  are  probably  ex- 
tremely heterogeneous  in  this  respect.  Still  more  heteroge- 
neous is  the  total  obtained  by  adding  imports  to  exports,  a 
quantity  which  is  changed  by  some  £15,000,000  by  the  alter- 
ation of  1  d.  per  lb.  in  the  price  of  raw  cotton,  without  any 
corresponding  alteration  in  any  of  the  things  we  wish  to  meas- 
ure; and  if  we  further  divide  by  the  population  to  obtain 
the  £24  of  foreign  trade  per  head  of  the  j^opulation,  which 
is  given  a  conspicuous  place  in  the  Statistical  Abstract,  we 
have  a  sum  of  essentially  unlike  quantities  divided  by  a  quan- 
tity which  is  heterogeneous  in  itself  and  has  dissimilar  rela- 
tions to  the  parts  of  the  numerator,  for  in  no  sense  are  the 
various  units  of  the  population  similarly  interested  in  exports 
or  imports.  The  height  of  absurdity  is  reached  when  amounts 
obtained  in  this  way  are  compared  country  by  country. 

There  are  two  methods  by  which  such  difficulties  may  in 
some  cases  be  overcome.  The  first  is  by  subdivision  by 
qualities,  the  second  by  grading  of  quantities.  If  we  are 
comparing  the  number  of  operatives  in  the  cotton  industry. 


UNITS  OF  MEASUREMENTS  153 

now  and  at  a  former  period,  the  totals  are  heterogeneous  in 
sex  and  age,  and  the  comparison  is  misleading,  for  the  num- 
bers of  adults  and  children,  men  and  women,  have  not  changed 
in  the  same  ratios ;  but  if  we  subdivide  by  age  and  sex,  we 
can  get  a  fair  basis  of  comparison,  and  still  better  if  we  can 
make  a  further  classification  by  skill.  Thus  we  should  con- 
tinue to  combine  attributes  in  our  unit,  till  one  unit  is  similar 
to  another  as  regards  the  purpose  for  which  the  totals  are 
to  be  used.  A  case  in  point  is  the  birth  rate  as  corrected  by 
reference  to  the  number  of  married  or  marriageable  women 
within  certain  age  limits,  instead  of  to  the  population  at 
large.  If  it  can  be  ascertained  that  the  various  dissimilar 
parts  bear  the  same  relation  to  each  other  in  both  totals,  e.g. 
if  there  had  been  no  change  in  distribution  of  age  or  sex  or 
sldll  in  the  cotton  industry,  the  heterogeneous  totals  may  be 
used.  The  second  method  applies  where  the  attribute,  which 
is  principally  to  be  considered,  is  susceptible  of  measurement, 
as  age  or  wage  or  income.  We  could  not  then  say  that  one 
population  was  twice  another,  but  could  group  according  to 
age,  and  compare  the  groups,  representing  them  by  curves 
or  mathematical  symbols ;  and  similarly  we  could  deal  Avith 
adult  male  wage  earners,  giving  not  only  their  number,  but 
their  distribution  by  wages.  It  should  be  said  that  an  aver- 
age should  always  be  suspected,  till  the  extent  of  the  homo- 
geneity^ of  its  numerator  has  been  tested,  in  relation  to  the 
purpose  for  which  it  is  to  be  used. 

We  cannot  in  general  obtain  perfect  similarity  in  our  units, 
without  such  subdivisions  as  would  leave  us  with  a  number  of 
unrelated  units,  instead  of  with  a  statistical  total  or  average ; 
butwecan  often  get  sufficient  similarity  for  practical  purposes. 
The  total  number  of  persons  relieved  under  the  Poor  Law  on 
a  given  day  conveys  no  useful  information ;  but  if  we  could 
get  the  number  of  persons  of  various  ages  and  of  various  de- 


154  STATISTICAL   METHODS 

grees  of  physical  and  mental  capacity  tabulated,  we  should 
be  able  to  make  useful  comparisons  from  place  to  place  and 
time  to  time. 

Universality.  —  When  an  investigation  is  made  it  must  deal 
impartially  with  the  whole  district,  the  whole  class,  or  the 
whole  period  in  question.  The  general  method  of  attempting 
to  secure  universality  is  to  count  all  that  is  practicable  and 
ignore  the  rest,  thus  introducing  an  error  of  unknown  magni- 
tude in  the  result.  Two  alternative  or  corrective  methods 
are  possible  and  in  some  cases  easy  to  practice.  The  first 
is  to  make  a  careful  estimate  of  the  maximum  or  minimum 
differences,  which  would  be  made  in  the  total  or  average  if  the 
missing  part  were  included.  If  nothing  whatever  is  kno^^'n 
as  to  the  omissions,  especially  if  their  existence  is  not  even 
suspected,  of  course  no  correction  can  be  made,  but  this  is 
not  the  general  case.  Passenger  journeys  on  railways  can 
be  calculated  by  the  number  of  tickets  issued,  together  with 
an  estimate  for  the  number  of  journeys  made  by  contract- 
ticket  holders.  In  the  population  census  an  estimate  can  be 
made  for  the  travelers  and  homeless  on  the  census  night.  In 
a  wage  inquiry  a  superior  estimate  can  be  made  for  the  wage 
earners  not  counted,  and  the  greatest  possible  effect  on  the 
average  can  be  calculated.  For  the  national  income  maximum 
and  minimum  values  could  be  (but  have  not  been)  estimated 
for  the  incomes  of  non-wage  earners  not  liable  to  income  tax. 
Such  estimates  of  the  residuum  are  sometimes  difficult  and 
often  unconvincing,  but  nothing  is  gained  by  ignoring  them. 

The  error  introduced  by  such  absence  of  universality  is  of 
the  kind  I  have  called  "biased,"  that  is  to  say  all  or  most  of 
the  different  parts  omitted  are  likely  to  drag  the  average  in 
the  same  direction.  It  is  the  obvious  that  are  counted, 
and  their  very  obviousness  is  an  attribute  that  differentiates 
them.     In  a  recent  American  Wage  Inquiry  (supplementary 


UNITS  OF  MEASUREMENTS  155 

to  the  census  of  1901),  the  biggest  firms  were  selected.  More 
generall}^  an  inquirer  aims  at  "typical  firms,"  and  very  fre- 
quently he  is  limited  to  the  firms  who  are  the  least  opposed 
to  an  investigation.  In  none  of  these  cases  will  the  true 
average,  as  it  would  be  obtained  from  a  universal  inquiry, 
be  obtained.  The  selection  of  the  typical  firm  appears  to 
be  the  most  plausible  method,  but,  to  put  the  criticism  briefly, 
the  "mode"  will  be  obtained  instead  of  the  arithmetical 
average,  and  these  two  are  not  in  general  identical. 

This  leads  to  the  second  method  of  obtaining  universality, 
that  is  the  method  of  samples.  I  have  recentlj^  dealt  with 
this  at  length,  and  need  only  emphasize  the  chief  essential  of 
collection,  that  every  unit  in  the  district  or  class  dealt  with  inust 
have  approximately  the  same  chance  of  inclusion,  and  that  the 
selection  must  deliberately  be  made  at  random ;  compared 
with  this  rule  the  number  of  units  contained  in  the  selected 
sample  is  unimportant.  The  only  test  of  the  adequacy  of 
the  sample  is  the  similarity  of  results  obtained  by  random 
subdivision  of  the  sample.  To  test  the  purity  of  London 
water  needs  the  examination  of  only  a  few  microscopic  quan- 
tities ;  to  estimate  the  earnings  of  outworkers  in  West  Ham 
would  need  a  very  extended  inquiry  before  the  accidents  of 
the  individual  samples  were  eliminated. 

Stahility.  —  In  modern  societies  the  totals  and  averages 
which  are  the  subject  of  statistical  measurement  are  seldom 
stationary  ;  some  fluctuate  with  extreme  rapidity  and  irregu- 
larity, some  have  fairly  regular  periodic  changes,  some  grow 
or  dechne  slowly  and  steadily.  In  the  first  case,  frequent 
measurements  are  necessary  for  the  presentment  of  anj^  ade- 
quate picture ;  for  example,  when  dealing  with  prices  in  gen- 
eral, with  the  earnings  of  pieceworkers,  or  with  meteorologi- 
cal statistics.  In  the  second  case,  the  measurements  must 
cover  a  complete  period,  after  the  length  and  constancy  of 


156  STATISTICAL   METHODS 

the  period  have  been  ascertained  ;  for  example,  with  pauper- 
ism, with  unemployment,  or  earnings  in  a  seasonal  trade.  In 
the  third  case,  which  apphes  to  population  statistics  and 
birth-,  marriage-,  and  death-rates,  and  others,  occasional 
measurements  are  sufficient,  and  intermediate  values  can  be 
interpolated.  In  all  cases,  the  frequency  of  the  measurements 
should  depend  on  the  stability  of  the  total  or  average. 

Comparability.  —  It  has  often  been  pointed  out  that  iso- 
lated statistical  totals  are  nearly  valueless,  and  that  we  need 
generally  to  study  change  or  differences ;  that  is,  we  need 
to  measure  similar  totals  differing  in  place  or  time.  When 
the  difference  is  in  place,  as,  for  example,  when  we  compare 
working-class  expenditures  in  Glasgow  and  in  London,  the 
analysis  already  given  as  to  homogeneity  and  stability  applies, 
but  the  homogeneity  must  extend  over  all  the  averages  com- 
pared ;  the  averages  must  be  estimated  by  unchanged  methods 
and  like  can  only  be  compared  with  like.  Under  this  test 
nearly  all  the  comparisons  that  have  been  made  between  the 
standards  of  living  in  different  countries  break  down. 

When  the  comparison  is  made  between  similar  totals  at 
different  dates,  the  rules  are  ob\aous  and  simple,  but  none 
the  less  neglected.  The  definition  of  the  unit  must  be  abso- 
lutely unchanged,  and  in  this  definition  I  have  included  the 
method  of  collection.  The  mistakes  and  omissions,  all  the 
"biased"  errors,  must  be  repeated.  Like  can  only  be  com- 
pared with  like.  This  is  a  hard  saying,  and  seems  to  rule 
out  all  progress  and  all  the  improvements  which  form  the 
subject  of  this  paper.  The  remedy  is  to  make  changes  in- 
frequently, but  permanently,  and  when  a  change  is  made, 
to  collect  the  information  both  by  the  unreformed  and  by 
the  reformed  method,  to  choose  the  former  for  comparisons 
with  the  past,  and  the  latter  for  comparisons  with  the  future. 
In  the  simple  cases,  where  improvement  is  made  by  simple 


UNITS  OF  MEASUREMENTS  157 

extension,  as  in  the  case  of  the  inclusion  of  the  value  of  ships 
sold  as  exports,  the  comparison  is  simple,  if  the  alteration  is 
clearly  made.  Where  improvements  of  tabulation  are  made, 
there  should  generally  be  little  difficulty  in  the  double  tabu- 
lation necessitated. 

Relativity.  —  I  am  using  this  word  for  the  logical  relation 
of  two  numbers  which  are  brought  together  as  numerator  and 
denominator,  or  as  factors.  While  comparability  concerns 
the  relation  of  like  to  like,  relativity  concerns  the  relation 
of  one  group  of  phenomena  to  a  dissimilar  group.  Thus 
the  quotients  already  mentioned,  value  of  exports  divided 
by  population  and  number  of  births  divided  by  number  of 
wives,  are  cases  of  relativity.  An  example  of  a  different  kind 
is  income  corrected  for  the  change  in  purchasing  power  of 
money.  In  order  that  a  quotient,  average,  or  rate  may  be 
perfectly  valid,  the  numerator  should  be  homogeneous  and 
the  denominator  should  be  homogeneous,  and  each  unit  in 
the  denominator  should  bear  the  same  potential  relation  to 
the  attributes  of  the  units  in  the  numerator.  Thus  the  out- 
put of  coal  per  hewer  employed,  the  number  of  ton-miles  per 
engine-hour,  and  the  average  earnings  of  self-acting  mule 
minders  are  vahd  in  this  sense.  The  work  of  all  hewers  is 
to  win  coal,  the  purpose  of  the  engine's  motion  is  to  drag  loads, 
and  all  mule  minders  are  engaged  on  similar  machines  and 
paid  on  a  similar  basis.  The  rigiditj^  of  the  rule  is  not,  how- 
ever, necessary.  Heterogeneity  that  leads  to  unbiased 
errors  is  admissible  from  the  principles  of  averages,  and  when 
two  such  averages  or  rates  are  compared  the  denominator 
may  bear  any  constant  ratio  to  the  ideal  denominator.  Thus 
if  the  relative  number  of  hewers  to  all  employed  in  or  about 
a  coal  mine  is  unchanged,  we  may  compare  the  outputs  per 
head  of  all  employed,  instead  of  per  hewer,  without  erroi-. 
The  consideration  of  relativity  has,  to  take  a  well-known 


158  STATISTICAL   METHODS 

example,  led  to  the  "  correcting  factor"  for  urljaii  death-rates  ; 
and  it  is  because  of  the  possibilities  of  error  indicated  that 
such  care  is  necessary  in  interpreting  income  or  wages  in  the 
hght  of  index  numbers  based  on  wholesale  prices. 

Accuracy.  —  It.  may  be  granted  that  no  statistical  meas- 
urement satisfies  perfectly  the  conditions  now  laid  down. 
Any  breach  of  these  conditions  leads  to  inaccuracy  of  re- 
sult, in  the  sense  that  the  total  or  average  or  other  result 
obtained  is  not  a  perfect  measure  of  the  group  as  defined 
fjr  investigation,  and  is  a  still  less  perfect  measure  of 
the  group  characteristic  which  we  ultimately  wish  to  know. 
The  main  thing  to  recognize  in  connection  with  official  sta- 
tistics is  that  their  accuracy,  in  spite  of  the  caution  and  sys- 
tematic verification  used  in  their  computation,  is  only  super- 
ficial. Their  universality  is  limited  by  their  methods  of 
collection.  The  number  of  births,  the  income  liable  to  in- 
come tax,  the  total  value  of  imports,  are  not  known  if  births 
are  unregistered,  income  concealed,  or  diamonds  imported 
in  passengers'  pockets.  The  measurements  are  not  closely 
fitted  to  the  quantities  of  which  we  want  knowledge.  We 
want  to  know  the  number  of  capable  persons  who  cannot 
get  work,  and  the  value  of  net  annual  earnings  in  terms  of 
the  economic  goods  on  which  they  are  spent;  the  labor 
department  returns  do  not  profess  to  tell  us  the  first,  and  it 
may  be  beyond  the  power  of  statistics  to  measure  the  second. 
Further,  most  statistics,  official  and  others,  fail  in  one  or 
other  of  the  respects  discussed.  The  result  is  that  statistical 
measurements  are  approximate,  and  should  be  frankly  given 
with  their  limitations  explicitly  described,  and  mth  the 
maximum  effects  of  their  errors  estimated.  The  supplemen- 
tary inquiry,  which  such  an  estimate  often  demands,  is  very 
seldom  made.  In  simple  cases,  where  the  measurements 
are  rough,  but  the  errors  unbiassed,  the  numbers  can  be  given 


UNITS  OF  MEASUREMENTS  159 

accurately  in  round  numbers ;  the  population  to  the  nearest 
ten  thousand,  say,  average  wages  to  the  nearest  half  crown, 
the  value  of  exports  to  the  nearest  £50,000,000,  and  so  on. 
In  any  case  we  should  avoid  such  a  statement  as  "  the  number 
of  illiterate  persons  above  10  years  old  in  U.  S.  A.  in  1900  was 
6,180,069,"  where  a  very  successful  investigation  could  hardly 
get  the  hundred  thousand  correct,  and  the  definition  of  illit- 
eracy is  vague,  and  also  has  very  little  relation  to  education. 
We  may  now  summarize  the  characteristics  of  good  sta- 
tistics. The  unit  of  measurement  should  be  absolutely  de- 
fined, its  attributes  should  be  precisely  those  which  are  re- 
lated to  the  inquiry,  and  the  group  should  be  sufficiently 
homogeneous  for  the  purpose  for  which  the  measurement  is 
needed.  The  collection  should  be  actually  universal  or  based 
on  samples,  scientifically  chosen,  with  adequate  tests  of  their 
sufficiency.  A  sufficient  number  of  observations  should  be 
made  to  test  stability.  Only  statistics  collected  and  computed 
by  the  same  methods  and  on  the  same  definitions  can  be  com- 
pared. When  two  unlike  totals  are  brought  into  relation 
with  each  other,  the  causal  connection  between  the  units  of 
the  one  and  the  units  of  the  other  should  be  close  and  inevi- 
table. The  accuracy  of  the  measurement,  as  limited  by  the 
definition  of  the  unit,  should  be  calculable. 

REVIEW 

1.  Contrast  Professor  Bowley's  treatment  of  units  of  measure- 
ments with  the  discussion  in  the  Text,  Chapters  II  and  III.  In 
what  particulars  are  they  the  same ;  in  what  way  different  ? 

2.  What  would  you  say  to  the  statement  that  "  Homogeneity 
is  always  relative  ;  absolute  homogeneity  is  unthinkable  "  ?  Is  this 
true  in  the  same  degree  for  all  problems  ?     Illustrate. 

3.  Illustrate,  out  of  your  own  experience,  the  significance  in 
statistical  study  of  Bowley's  conception  of  "relativity." 

4.  Suppose  you  were  asked  to  list  all  of  the  brick  houses  in  a 


160  STATISTICAL   METHODS 

certain  section  of  your  city ;  all  of  the  female  servants  attached  to 
the  houses.  What  conditions  would  you  set  up  for  identification? 
Write  the  instructions  to  a  group  of  clerks  for  such  an  enumeration. 
Would  these  instructions  be  equally  good  for  all  purposes?  Why? 
5.  Consult  the  United  States  Census  of  Manufactures  for  1914 
for  the  definition  of  an  "establishment."  Compare  this  with  the 
definition  used  by  the  Census  for  1890.  What  is  an  immigrant? 
Where  can  you  find  out?  What  is  a  business  failure  (see  Brad- 
street's,  January  31,  1920,  p.  82)?  Would  you  think  it  difficult  to 
count  such  units?     Why? 

A  Mile  of  Track  ^ 

It  may  seem  that  the  mile  of  track  is  a  kind  of  statistical 
unit  that  is  very  easy  to  deal  with.  Quite  the  contrary 
is  true.  Owing  to  the  complicated  character  of  the  network 
of  tracks  of  many  companies  crossing  and  in  effect,  through 
joint  and  often  somewhat  indeterminate  rights  of  ownership, 
commingling  wath  each  other  in  New  York  City,  resulting 
in  frequent  dupHcation  or  ambiguity  of  returns,  and  in  the 
presence  of  a  large  amount  of  "special  work"  of  all  sorts, 
instead  of  there  being  almost  exclusively  straight  rail,  meas- 
urements and  returns  of  track  mileage  furnish  data  that  are 
about  the  most  difficult  to  assemble  and  compile  of  any  of- 
fered in  this  report,  even  apart  from  occasions  for  doubt 
as  to  how  unused  track  is  dealt  with.  Under  these  circum- 
stances it  is  not  surprising  that  some  of  the  companies 
frequently  remeasure  their  property  and  revise  their  figures. 

REVIEW 

1.  Consult  the  secretary  or  some  other  official  of  the  street  rail- 
way in  your  city,  relative  to  the  meaning  of  a  mile  of  track  as  used 
by  the  company. 

"■  Adapted  with  permission  from  Annual  Report  of  the  Public  Service  Com- 
mission of  the  First  District  of  the  State  of  Xcw  York,  1913,  Vol.  II,  p.  35. 


UNITS  OF  MEASUREMENTS  161 

2.  What  meaning  does  your  city  engineer  assign  to  a  mile  of 
improved  street?  Discuss  with  him  other  possible  meanings. 
Does  he  use  both  simple  units  and  coefficients?     What  are  they? 

Accidents  in  Public  Utility  Statistics  ^ 

The  value  of  any  kind  of  statistics  depends  largely  on 
the  quality  of  the  unit.  In  the  casualty  statistics  here  pre- 
sented the  units  dealt  with  are  cases  of  killing  or  of  injury 
inflicted  on  persons,  the  agency  being  the  street-railway 
companies  of  the  city.  In  the  broader  sense,  ''injury"  is 
properly  the  inclusive  word,  but  it  seems  unavoidable  to 
use  it  to  mean  less  than  fatal  injuries.  In  the  present  re- 
port it  is  employed  in  this  narrower  sense. 

At  first  glance,  it  might  seem  that  there  could  be  no 
question  of  the  meaning  of  injury.  A  person  killed  is  killed. 
A  person  thrown  from  a  car  and  suffering  a  broken  arm  is 
injured.  So  far  injuries  are  discrete  and  easily  recognized 
units.  But  this  is  as  far  as  the  simplicity  goes.  In  a  col- 
lision, several  persons  may  be  mortally  injured,  but  not 
killed  outright.  To  classify  as  merely  an  injury  a  mishap 
that  results  in  death  within  an  hour  is  manifestly  incor- 
rect. On  the  other  hand,  if  a  person  has  a  weak  heart, 
and  is  severely  shaken  up  and  bruised  and  scared,  as  a  re- 
sult of  which  he  dies  of  heart  failure  within  a  month,  the 
cause  of  his  death  is  primarily  not  the  railway  accident,  but 
the  physical  weakness  that  existed  independently  of  the 
accident.  And  yet  mishaps  may  occur  to  people  of  normal 
or  even  exceptional  health  that  undermine  their  health  and 
strength  and  finally  cause  death  directly  traceable  to  the 
car  accident,   though  not  in  point  of  time  its  immediate 

^  Adapted  with  permission  from  Annual  Report  of  the  PiMic  Service  Com- 
mission for  the  First  District  of  the  State  of  New  York,  1913,  Vol.  II,  pp. 
137-140. 

M 


162  STATISTICAL   METHODS 

consequence.  In  a  possible  suit  for  damages  it  may  be  nec- 
essary to  go  deeply  into  the  causes  of  a  death.  In  addi- 
tion to  the  question  above  indicated  it  might  have  to  be 
decided  whether  the  person  was  a  suicide  or  not.  In  these 
statistical  tables,  however,  we  are  concerned,  not  with  the 
tragedy  of  each  death,  but  with  the  numbers  of  deaths, 
and  those  numbers  taken  in  connection  with  the  volume 
and  magnitude  of  the  traffic.  To  take  the  number  killed 
outright,  add  to  them  the  number  that  happen  to  die  upon 
the  cars,  and  those  injured  who  die  at  any  time  after  the 
injury,  would  entail  practically  impossible  labor  in  follo^ving 
up  each  case.  In  accident  statistics  we  are  concerned,  not 
with  the  individual  cases,  but  only  with  representative 
averages.  From  this  point  of  view  it  is  sufficient  to  draw 
the  line  between  the  killed  and  the  injured  upon  the  basis 
of  a  fixed  interval  of  time  occurring  before  death  follows 
upon  the  accident.  In  the  present  statistics  if  death  results 
at  any  time  within  three  days,  the  case  is  counted  among 
the  killed ;  if  later,  it  is  classed  as  an  injury,  naturally  or 
presumably  a  serious  injury,  though  death  may  be  so  in- 
direct a  consequence  and  so  long  delayed  that  this  classi- 
fication is  not  certain.  From  the  point  of  view  of  exact 
science,  even  with  reference  to  statistical  needs,  the  time  in- 
terval in  question  should  be  so  defined  that  the  total  number 
of  deaths  so  classified  in  the  statistics  is  the  number  directly 
caused  by  the  accidents,  but  of  course'  some  occurring  within 
an  interval  so  fixed  would  be  due  primarily  to  causes  other 
than  the  accidents,  while  a  compensating  number  occurring 
later  would  be  directly  due  to  them.  In  fact,  there  are  no 
means  at  present  practicable  for  determining  the  proper 
interval  in  question  thus  exactly.  But  for  purposes  of  sta- 
tistical classification  and  comparison  the  interval  may  be 
quite  arbitrarily  fixed,  and  yet  serve  very  well,  provided 
the  definition  be  clear  and  unmistakable. 


UNITS  OF  MEASUREMENTS  163 

At  the  other  extreme,  there  is  a  similar  difficulty  of  classi- 
fication in  drawing  the  line  between  what  is  an  injury  and 
what  is  not  an  injury.  Laxity  of  definition  at  this  point 
is  to  be  expected,  since  much  must  depend  upon  the  bare 
statement  of  the  person  most  directly  concerned,  and  he  is 
not  likely  to  be  entirely  disinterested  in  view  of  the  possi- 
bility of  his  becoming  the  beneficiary  of  a  damage  claim. 

In  the  sub-classification  of  injuries  by  kind,  the  difficulties 
of  classification  multiply.  "Fractured  skull"  and  "ampu- 
tated limb"  are  definite  enough  or  can  be  made  so,  but  a 
"serious  injury"  not  defined  with  the  utmost  care  is  of 
quite  indeterminate  significance.  Probably  the  best  method 
of  defining  with  reference  to  seriousness,  when  the  defini- 
tion cannot  be  based  on  anatomical  facts,  is  by  way  of  the 
duration  of  disability.  A  hospital  case  is  of  course  to  be 
classed  as  serious,  but  a  visit  to  a  hospital  for  examination 
or  observation  should  not  be  so  counted.  .  .  . 

For  purposes  of  statistical  comparison  with  other  years, 
and  on  occasion  with  other  cities,  it  is  necessary  to  reduce 
the  absolute  numbers  for  accidents  to  ratios.  Since  the 
movement  of  the  cars  causes  most  of  the  accidents,  one  very 
important  ratio  is  casualties  per  car  mile,  or  what  is  in 
effect  the  same  and  is  somewhat  more  convenient  with 
reference  to  the  relative  magTiitude  of  the  terms  of  the 
ratio,  casualties  per  100,000  car  miles.  In  relation  to  in- 
juries to  passengers,  the  ratio  to  passengers  carried  is  the 
better  basis  of  comparison.  A  still  better  ratio,  that  is, 
to  the  passenger  mile,  is  not  available  for  street-railway 
statistics.  The  greater  ratio  of  accidents  to  passengers 
on  the  steam  railroads  is  of  course  largely  explained  away 
by  the  greater  average  length  of  ride  of  steam-railroad 
than  of  street-railway  passengers.  Moreover,  this  ratio 
of  accidents  per  passenger  mile  —  it  may  well  be  noted  — 


164  STATISTICAL   METHODS 

is  probably  subject  to  qualification  with  reference  to  the 
greater  likelihood  of  accident  per  passenger  mile  at  rush 
hours.  The  effects  of  such  minor  causes  of  possible  mis- 
representativeness,  however,  entirely  disappear  in  most 
comparisons.  The  fundamental  ratio  for  injuries  to 
employees  is  casualties  per  given  round  number  of  em- 
ployees. But  since  the  employee  is  exposed  to  accident 
throughout  the  year,  instead  of  for  a  fraction  of  an  hour, 
we  should  expect  a  higher  casualty  rate  per  employee  than 
per  passenger,  except  in  so  far  as  the  difference  in  the  nature 
of  the  two  sorts  of  returns,  as  mentioned  above,  affects  the 
comparison.  But  the  number  of  employees  varies  with  the 
number  of  passengers  to  be  served,  hence  the  inclusion  of 
casualties  to  employees  and,  for  a  similar  reason,  to  "others," 
in  a  comprehensive  "per  passenger"  ratio  is  not  indefensible. 

REVIEW 

1.  When  is  an  "injury,"  resulting  in  death,  termed  an  accident 
in  the  statistical  usage  of  the  Public  Service  Commission?  Would 
this  criterion  be  satisfactory  for  universal  use?     Why? 

2.  What  composite  units  does  the  author  name?  Does  his 
contention  concerning  the  definition  of  these  agree  with  the  Text's  ? 

3.  What  are  the  significant  coefficients  for  Public  UtUity  Statis- 
tics of  Accidents?  In  what  way  is  the  writer's  discussion  of  this 
point  related  to  Bowley's  treatment  of  "relativity"  in  statistical 
units  ? 

Industrial  Accident  Rates  ^ 

The  purpose  of  accident  studies  is  the  very  practical 
one  of  finding  out  where  and  why  accidents  occur  and 
how  they  may  be  prevented.     The  first  stage  in  every  such 

*  Adapted  with  permission  from  Chaney,  Lucian  and  Hanna,  Hugh  S., 
"The  Safety  Movement  in  the  Iron  and  Steel  Industry,  1907  to  1917," 
Bulletin  of  the  United  States  Bureau  of  Labor  Statistics ;  Whole  Number  234, 
pp.  52-66,  June,  1918. 


UNITS  OF  MEASUREMENTS  165 

study  is  necessarily  the  counting  and  analysis  of  the  acci- 
dents reported.  In  attempting  this,  two  serious  difficulties 
present  themselves :  First,  the  lack  of  a  uniform  definition 
of  what  is  to  be  regarded  as  an  "accident"  ;  and,  second,  a 
confusion  as  to  the  proper  derivation  and  use  of  accident 
rates.  Failure  to  grasp  the  importance  of  those  two  points 
has  been  responsible  for  much  loose  thinking  and  many 
false  conclusions,  and  also  has  been  responsible  for  the 
present  unsatisfactory  character  of  accident  statistics  in 
this  country. 

Definition  of  "Accident" 

First,  then,  what  is  to  be  regarded  as  an  industrial  acci- 
dent for  the  purposes  of  statistical  study?  No  definition 
has  as  yet  been  universally  accepted.  Some  estabhshments 
and  States  attempt  to  take  account  of  all  injuries,  however 
trivial.  Others  exclude  those  of  a  minor  character  and  take 
account  only  of  such  as  cause  a  loss  of  a  specified  amount 
of  time.  It  is  evident  that  the  accident  showing  of  a  plant 
may  be  completely  altered  by  a  change  in  definition  of  acci- 
dent, and  that  in  the  absence  of  a  uniform  definition  all 
comparisons  between  the  accident  data  of  different  plants, 
industries,  or  other  groups  become  almost  worthless.  The 
precise  definition  is  not  so  important.  The  important  thing 
is  that  the  same  definition  should  be  everywhere  observed. 

The  most  significant  step  so  far  taken  toward  such  uni- 
formity in  this  country  is  the  recent  action  of  the  Inter- 
national Association  of  Industrial  Accidents  Boards  and 
Commissions  in  adopting  a  definition  of  "tabulatable  acci- 
dents"—  i.e.  a  definition  not  necessarily  to  be  followed 
in  the  original  reporting  of  accidents,  but  to  be  used  in  all 
statistical  tabulations.  The  definition  is  substantially  the 
same  as  the  one  long  used  by  the  Bureau  of  Labor  Sta- 


166  STATISTICAL   METHODS 

tistics  in  its  accident  investigations  and  employed  in  the 
present  report : 

"Tabulatahle  accidents,  diseases,  and  injuries.  —  All  acci- 
dents, diseases,  and  injuries  arising  out  of  employment  and 
resulting  in  death,  permanent  disability,  or  any  loss  of  time 
other  than  the  remainder  of  the  day,  shift,  or  turn  in  which 
the  injury  was  incurred,  shall  be  classified  as  '  tabulatable 
accidents,  diseases,  and  injuries '  and  a  report  of  all  such 
cases  to  some  State  or  National  authority  shall  be  required." 

The  States  which  belong  to  the  International  Association 
of  Industrial  Accident  Boards  and  Commissions  are  thus 
committed  to  a  uniform  standard  definition  of  the  accidents 
which  are  to  be  tabulated.  Some  States  may  at  first  find 
it  impossible  to  tabulate  all  accidents  as  required  by  the 
definition,  but  the  desirability  of  doing  so  is  apparent,  and 
many  have  already  made  a  beginning. 

The  Meaning  of  Accident  Rates.  —  The  second  of  the 
two  above-mentioned  difficulties  —  the  determination  and 
use  of  accurate  accident  rates  —  presents  a  more  serious  prob- 
lem than  that  involved  in  definition  of  accident.  Here  it 
is  necessary  not  only  to  have  uniformity,  but  to  decide 
upon  a  correct  method.  In  the  early  attempts  of  accident 
statistics,  attention  was  limited  to  the  number  of  accidents 
occurring  in  a  given  plant  or  group.  But  mere  numbers, 
of  course,  meant  nothing  unless  related  to  the  number  of 
persons  exposed  to  accident.  This  led  to  the  custom  of 
expressing  accident  in  terms  of  so  many  per  thousand 
workers,  and  constituted  an  approach  to  a  correct  method. 
To  say  that  a  given  industry  had  an  accident  rate  of  100 
per  thousand  workers  does  convey  a  definite  idea,  and  can 
be  compared  with  a  rate  of,  say,  300  per  thousand  workers 
in  another  industry.  But  the  method  was  extremely 
crude,  because  the  basic  figure  "1000  workers"  was  indef- 


UNITS  OF  MEASUREMENTS  167 

inite  and  variable.  Usually  it  was  derived  by  rough  esti- 
mate as  to  the  number  of  persons  employed,  such  as  aver- 
aging the  number  employed  at  different  times  of  the  year 
or  averaging  the  pay  rolls  of  the  year.  But  no  such  aver- 
age could  be  at  all  an  accurate  measure  of  what  was  wanted. 
The  number  of  days  worked  varies  in  different  plants  as  do 
also  the  daily  hours  of  labor.  Two  plants  may  have  the 
same  yearly  accident  rate,  say,  200  per  "1000  workers," 
estimated  on  the  above  basis,  but  if  one  worked  only  8 
hours  a  day  for  250  days  and  the  other  worked  12  hours 
a  day  for  365  days,  it  is  clear  that  the  real  accident  haz- 
ard is  much  higher  in  the  former  plant,  inasmuch  as  the 
same  number  of  accidents  per  1000  workers  occurred  dur- 
ing a  much  more  limited  period  of  time. 

Accident  Frequency  Rates.  —  From  this  weakness  it  be- 
came evident  that  in  order  to  get  a  rate  that  would  meas- 
ure real  hazard,  it  is  necessary  to  know  not  only  the  number 
of  men  employed,  but  also  the  time  of  their  employment. 
The  only  way  to  obtain  this  is  to  ascertain  the  actual  num- 
ber of  hours  worked  by  all  employees  for  the  year.  This 
gives  the  number  of  man-hours,  i.e.  the  theoretical  number 
of  men  required  to  produce  the  output  of  the  plant  in  one 
hour,  or  what  is  the  same  thing,  the  theoretical  number  of 
hours  required  by  one  man  to  turn  out  the  same  product. 
Man-hours  so  derived  constitute  the  correct  basis  upon  which 
to  calculate  accident  rates.  But  the  term  is  unfamiliar 
and  for  practical  purposes  it  is  convenient  to  convert  man- 
hours  into  full-time  workers.  The  full-time  worker,  as 
defined  by  the  joint  committee  of  the  International  Con- 
gress on  Social  Insurance  and  the  International  Institute 
of  Statistics,  is  one  who  works  10  hours  per  day  for  300 
days  per  annum,  making  a  total  of  3000  hours  per  annum. 

The  full-time  worker,  or  300-day  worker,  so  defined,  may 


168  STATISTICAL   METHODS 

seem  at  first  thought  to  be  a  mere  statistical  abstraction. 
It  is  true  that  the  full-time  worker,  like  the  average  man, 
is  a  unit  of  measure,  not  a  Uving,  breathing  man,  but  for 
the  purpose  of  accident  statistics  a  standardized  workman 
to  serve  as  a  unit  of  measure  is  absolutely  essential. 
Furthermore,  the  statistical  full-time  workman  who  is 
assumed  to  work  10  hours  a  day  for  300  days  in  the  year 
conforms  very  closely  in  most  industries  to  the  actual  work- 
man who  enjoys  good  health  and  works  every  day  the 
establishment  is  running. 

Accident  statistics,  to  be  comparable, 'must  be  stated  in 
terms  of  a  common  unit  of  measure.  The  300-day  worker 
is  merely  a  unit  of  measure  of  the  quantity  of  labor,  just  as 
the  yard  is  the  unit  of  measure  for  length.  The  number  of 
300-day  or  full-time  workers  is  obtained  by  dividing  the 
number  of  man-hours  actually  worked  in  an  estabUsh- 
ment  by  3000,  the  number  of  hours  per  annum  assumed 
to  be  worked  by  the  300-day  worker. 

In  those  establishments  which  keep  accurate  records  of 
the  hours  worked  by  each  employee  every  day,  the  man- 
hours  worked  by  the  estabUshment  can  easily  be  obtained 
from  the  records  and  hence  the  number  of  full-time  or  300- 
day  workers  can  easily  be  computed.  Few  small  estab- 
lishments, however,  keep  any  such  accurate  records  of  time 
worked.  For  the  majority  of  small  plants  it  is  necessary 
to  compute  the  number  of  man-hours  worked  and  the  full- 
time  (300-day)  workers.  The  method  suggested  by  the  con- 
ference called  by  Commissioner  Meeker,  which  met  in 
Chicago  October  12  and  13,  1914,  was  as  follows:  "If  this 
exact  information  is  not  available  in  this  form  in  the  records, 
then  an  approximation  should  be  computed  by  taking  the 
number  of  men  at  work  (or  enrolled)  on  a  certain  day  of  each 
month  in  the  year  and  the  average  of  these  numbers  multi- 


XJNITS  OF  MEASUREMENTS  169 

plied  by  the  number  of  hours  worked  by  the  establishment 
for  the  year  would  be  the  number  of  man-hours  measuring 
the  exposure  to  risk  for  the  year." 

By  the  method  outlined,  true  rates  are  obtained  as  re- 
gards the  risk  of  accident  occurrence  or  frequency.  These 
rates  may  be  called  accident  frequency  rates.  Thus  if  the 
accident  frequency  rate,  so  derived,  for  the  steel  industry 
is  114  per  1000  full-time  workers,  and  is  118  for  the  machine 
building  industry,  it  is  correct  to  conclude  that  accidents 
are  less  frequent  in  the  steel  industry  than  in  machine 
building,  in  the  proportion  of  114  to  118.  All  differences 
in  the  hours  of  labor,  number  of  days  worked,  etc.,  in  the 
two  industries  have  been  duly  taken  into  account.  Again, 
if  a  given  plant  shows  an  accident  frequency  rate  of  100 
one  year,  and  90  the  next,  it  is  a  correct  conclusion  that 
accidents  have  decreased  10  per  cent  in  frequency. 

Accident  Severity  Rates.  —  Frequency  rates  of  this  char- 
acter were  computed  and  used  in  the  report  on  accidents 
in  the  iron  and  steel  industry,  issued  by  the  Bureau  of  Labor 
Statistics  in  1913.  In  all  the  estal^lishments  covered  the 
number  of  man-hours  worked  per  year  was  obtained  and 
the  working  force  then  reduced  to  so  many  full-time  or 
300-day  workers. 

The  method  was  found  practicable  and,  within  limits, 
highly  useful.  But  it  had  one  serious  weakness,  namely, 
that  frequency  rates,  as  the  name  indicates,  measure  the 
frequency  of  accidents,  but  take  no  account  of  the  severity 
of  the  resulting  injuries,  and  experience  has  shown  that  the 
two  things  do  not  necessarily  move  in  the  same  direction. 
The  frequency  rates  may  be  the  same  in  two  plants  in  the 
same  industry,  and  the  hazards  may  be  entirely  different 
because  one  plant  has  very  few  severe  accidents,  while  the 
other  has  a  large  proportion  of  serious  accidents.     To  put 


170  STATISTICAL   METHODS 

all  industries  and  all  plants  on  a  common  basis  a  system 
of  computing  accident  rates  must  be  devised  which  will 
take  into  account  the  difference  in  economic  significance 
between  the  accident  which  bruises  the  workman's  thumb 
and  the  accident  which  breaks  his  back. 

In  other  words,  what  is  needed  is  some  method  of  weight- 
ing injuries  according  to  their  severity.  Several  methods 
suggest  themselves  as  possible  —  compensation  paid,  wage 
loss,  or  time  loss.  A  compensation  system  necessarily 
weights  the  importance  of  accidents  in  fixing  a  scale  of  bene- 
fits which  aims  to  apportion  the  payment  to  the  hurt.  But 
compensation  payments  do  not  offer  the  universal  measure 
desired  because  the  benefits  differ  from  State  to  State  and 
are  also  subject  to  change  within  the  same  State. 

Wage  loss  due  to  injury  offers  perhaps  a  better  measure 
of  severity,  but  this,  too,  suffers  mider  the  handicap  that 
wages  differ  from  place  to  place  and  from  time  to  time. 
Time  loss  as  a  measure  does  not  suffer  from  these  objections. 
An  accident  that  causes  6  days'  disabihty  is  precisely  twice 
as  serious  as  one  causing  only  three  days'  disability,  and 
this  relation  is  always  and  everj'where  the  same. 

The  days  lost  because  of  injury  may  thus  be  taken  as  the 
most  satisfactory  measure  of  the  true  hazards  of  industry 
—  of  the  burden  imposed  upon  the  worker  and  the  com- 
munity because  of  industrial  accidents.  The  only  diffi- 
culty in  its  practical  application  is  that  in  case  of  death  and 
permanent  injuries  the  time  lost  must  be  estimated.  For 
temporary  disabilities,  from  which  recovery  is  complete, 
the  time  losses  are  matters  of  record  —  2  days,  10  days, 
6  weeks,  as  the  case  may  be.  But,  if  the  accident  results  in 
death,  the  time  loss  is  not  so  clearly  measurable.  It  exists, 
however,  and  may  be  estimated  as  the  number  of  working 
days  by  which  the  worker's  Ufe  was  curtailed.     Similar  es- 


UNITS  OF  MEASUREMENTS  171 

timates  are  possible  in  case  of  permanent  injuries,  such  as  loss 
of  hand  or  foot. 

After  a  study  of  the  available  information  a  table  of  time 
losses  for  injuries  resulting  in  death,  permanent  total  dis- 
ability, and  permanent  partial  disability  was  determined 
upon  and  applied  in  this  report.  The  procedure  followed 
was  as  follows : 

Fatalities.  —  In  case  of  an  injury  causing  death  the  time 
loss  to  the  family  and  society  is  the  expectancy  of  pro- 
ductive working  life  of  the  deceased  workman.  It  is  not 
possible  to  learn  the  age  of  all  workmen  killed  in  industrial 
accidents ;  but  from  estimates  made  by  the  Wisconsin 
Industrial  Commission,  from  statistics  obtained  by  several 
compensation  commissions,  and  from  the  investigations 
of  the  Bureau  of  Labor  Statistics,  it  seems  reasonable  to 
estimate  that  the  average  age  of  victims  of  fatal  accidents 
is  approximately  30  years.  According  to  the  American 
life  tables,  the  life  expectancy  at  age  30  is  35  years.  This 
is  for  the  population  as  a  whole.  Workingmen  exposed 
to  all  the  hazards  of  illness  and  accident  in  industry  have  a 
shorter  expectancy  of  hfe  than  the  average  for  the  whole 
population.  The  expected  productive  life  of  workers  is 
even  shorter  than  their  life  expectancy.  Exact  data  are 
lacking,  but  in  the  light  of  all  obtainable  information  it 
seems  fair  to  estimate  the  working  time  lost  on  the  aver- 
age by  relatives  and  the  community  for  each  workman  killed 
by  accident  as  30  years,  or  9000  working  days,  counting  300 
working  days  to  the  year.  This  is  admittedly  an  estimate. 
A  mathematically  accurate  measure  is  obviously  impos- 
sible. It  is  also  unimportant.  The  main  thing  is  to  get  the 
best  possible  approximation  and  to  apply  it  to  existing  acci- 
dent statistics  for  the  purpose  of  comparing  accident  records 
plant  by  plant,  industry  by  industry,  and  year  by  year. 


172  STATISTICAL   METHODS       - 

Permanent  Total  Disabilities.  ■ —  If  the  loss  of  working 
time  to  families  and  to  the  community  were  the  sole  thing 
to  be  shown  in  accident  statistics,  the  same  time  loss  should 
be  fixed  for  permanent  total  disabilities  as  for  fatahties. 
Permanent  total  disabihty  is,  however,  a  greater  burden 
to  relatives  and  the  community  than  death.  In  recogni- 
tion of  this  obvious  fact  the  time  loss  for  permanent  total 
disability  has  been  fixed  at  35  years  or  10,500  working  days. 
The  relative  importance  or  bui-densomeness  of  permanent 
total  disabihties  as  compared  with  fatahties  is  thus  estab- 
lished rather  arbitrarily.  After  further  experience  it  may 
be  advisable  to  change  the  relative  weights.  The  system 
of  weighting  used  does  recognize,  however,  the  undeniable 
fact  that  complete  permanent  incapacity  of  a  worker  is  a 
greater  burden  than  his  death ;  and  some  recognition,  even 
if  unscientific,  is  better  than  ignoring  the  obvious  facts. 

Permanent  Partial  Disabilities.  —  A  proper  weighting  for 
permanent  partial  disabilities  in  terms  of  days  lost  is  even 
more  difficult  than  for  death  and  permanent  total  disa- 
bilities. An  examination  of  the  various  compensation  acts 
in  existence,  however,  gives  a  clue  worth  following  in  the 
quest  for  some  method  of  estimating  the  severity  of  perma- 
nent partial  disabilities  in  terms  of  days  lost.  First,  it  appears 
that  all  compensation  acts  agree  in  fixing  the  loss  of  an 
arm  as  the  most  serious  injury  less  than  total  disability. 
Most  acts,  however,  seem  illiberal  in  the  amount  of  com- 
pensation granted  for  this  injury.  The  New  York  act  is 
one  of  the  most  liberal.  It  grants  for  loss  of  arm  com- 
pensation for  312  weeks,  which  is  equivalent  to  1872  work- 
ing days.  Inasmuch  as  the  New  York  scale  is  based  on  two- 
thirds  of  wages  it  may  be  assumed  that  the  entire  economic 
burden  was  recognized  to  be  one-half  greater  than  the  benefit 
actually  allowed.     The  loss  of  an  arm  would  thus  be  equiv- 


UNITS  OF  MEASUREMENTS  173 

alent  to  an  economic  loss  of  468  weeks,  or  2808  days.  This 
in  turn  is  equivalent  to  about  31  per  cent  of  the  allowance 
fixed  above  for  death  (9000  days)  and  27  per  cent  of  the 
time  lost  for  permanent  total  disability  (10,500  days).  This 
seemed  a  reasonable  valuation  of  the  arm  in  relation  to  per- 
manent total  disability  and  death,  and  was  thus  adopted 
for  the  scale  to  be  used  by  the  bureau. 

Having  thus  fixed  a  time  value  for  the  arm,  it  remained 
to  value  the  other  permanent  partial  disabilities.  There 
is  a  striking  similarity  among  the  various  acts  in  the  re- 
lation of  compensation  benefits  granted  for  loss  of  an  arm 
to  those  granted  for  the  lesser  disabilities.  The  degree  of  this 
uniformity  is  indicated  by  the  table  on  p.  174. 

Because  of  the  substantial  uniformity  between  the  States 
the  scale  of  awards  of  almost  any  State  would  have  given 
approximately  the  same  relative  importance  to  minor  dis- 
memberments compared  to  loss  of  arm.  The  New  York 
scale  was  adopted  as  being  one  of  the  latest  developed, 
and  also  because  its  system  of  classification  of  injuries  was 
one  readily  adaptable  to  the  form  in  which  a  large  part  of 
the  data  secured  by  the  bureau  was  given. 

As  a  result  of  the  above  procedure  permanently  disabling 
injuries,  as  well  as  death  itself,  were  assigned  values, 
expressed  in  terms  of  a  common  denominator— namely,  work- 
days lost.  These  values,  to  repeat,  are  necessarily  arbi- 
trary, but  the  fact  that  they  are  not,  and  cannot  be,  abso- 
lutely accurate,  in  no  way  diminishes  their  usefulness  for 
the  purpose  in  view. 

The  following  table  brings  together  the  time  losses  for 
death  and  the  more  common  forms  of  permanent  disabiU- 
ties  as  finally  adopted  for  the  bureau's  scale.  Columns  of 
percentages  based  on  this  scale  of  time  losses  are  also  given, 
showing,  first,  the  relative  importance  of  the  lesser  injuries 


174 


STATISTICAL  METHODS 


H 

SI 
H 

Q 

0 

m 
a 

M 

3 
<; 

•J} 


Q 
M 

U 

O 

P4 


H  5 


o 

a 

< 


Oh 

o 
O 


n 


o  ^ 

o 

CO 


c3  ffi 

a 


■5  ^ 


.2  bo 

J3  C 


C.=  - 


s 


w 


o 
o 


T3 


eg 


ooooioo'cioc^co'cooooooco      oo 

COCCiCO(MCO(MiM'-iC<5rO'-iMCOC<3C<5C<5'*<         (N(N 


C^li— *CC»— '1— It— (.— (i— (t-Hi— (  T— IrHrHi— IrHC^l  i-Ht— < 


'O  C  O  O  O  X  O  CI  O  O  CI  IM  O  lO  O  O  lO    lO  X 
(M  IM  CO  CI  C<1  "-I  CI  ^  (M  CI  ^  C<)  O)  <M  <M  !M  M    "-l 


OOOiOOOiOCIOOiOOOCOOOJ         OiO 
coMocc^coc^c^'-|^^^5•— icocorocccoro       cii— i 


X>000'OOOC<l»CiOOC5>00»0->OOs         >oo 
cocococCTfcocO'-iMoscirocO'^coMO       oaci 


cioioooioiocjoocjciooooci       oo 

•-'CO'-iClfOCJCI'—COCOClCOCOCOCCMiO  ClCl 


XOOOOOOC1000100000-*       oo 


rfOOOOOOOOOOXOXOOCO'OOO 
OOOOOOOiOOOOOOClOOt^CJOM 


OiOUJiOiCiOOOiOiOiOMiOiOXiOt^OOO 

coodc^caoioiO(Nciciioc<iociioi^iONCi 


ciiou5»coo»oou:iiooioiox>oic-Hiooo 

XN.t>.l>0'Ot>.»CI^I^XOlt>Xt^t^X'-il>0 

I— It— (f-HrHC|i— trH  t-Ht-Ht— ti-Hi— (C^rHt— (COC^»— «'— ' 


<00000'OOOOOOCOP-*000>000 
»OlOiOiOirt(M'OiO»OiOiOt>.ui-*'OOdt^T)<CO 

»-H»-Hr-4r-tt— tt-H»— t  i— If— ti— It— tr-1C)fHdCC»-<»-HT^ 


xooooocoooor^ocioocoiooo 

OOOOO'OO'OOOO^hO'-iO'O'— ci— (t>.'^ 

c^cicqc<ic^»^ci       ddcjcic<ic<5NCi-*c^t-iiN 


3 

■ 

c 

-M 

0) 

CO 

c 

o 

C 

C 

o 

^ 

C 

hH 

a  w 

c  o 


OS 


tc 


.  fe  = 
i^>. 


c 

C    G    > 


a    ■ 
> 


S  > 


>  c   s  tx 


o 

s 


c 
o 

j_    no 


> 
■  <a 

^1 

o  a 
fe-a 

Ss 
>>> 

°^ 

a  a 

^  a>  o 

O  I-  " 

Re  I 
c '^  a 

^■|^^ 

g-^  0).= 

cs  s  a  ^ 
L^  V  a  kt 

TO  -^  . 

0  0)  a  c 

—  o'S  «> 
.—  *^— .  ^ 

O  08   I;   o 
"U.E&H    « 

t:  V.   .  * 

=S  03  >>4, 
^   (D.-S   0 

.2  tn  ca  '^ 

^    ^    li    M 


-    w    Qj    03 

^T3  >  q    • 

S  S  o     -S 

-M  -tJ  ^    O    g 

c  c  c-C  ^ 

o  c  o  oj  C 

SESag 
>>  >,  >>o^ 

ChCuOiEh 


UNITS  OP  MEASUREMENTS 


175 


as  compared  with  the  loss  of  an  arm,  and,  second,  the  rela- 
tive importance  of  time  losses  from  death  and  from  the 
lesser  injuries  as  compared  with  the  time  loss  from  perma- 
nent total  disability.  Other  forms  or  combinations  of 
disabilities  than  those  shown  in  this  list,  such  as  minor 
injuries  to  the  eye,  may  be  assigned  intermediate  values.  .  .  . 

Table  II.  —  Time  Losses  Fixed  for  Death  and  Permanent 

Disabilities 


Death 

Permanent  total  disability  . 
Loss  of  members : 

Arm 

Hand 

Leg 

Foot 

Eye 

Thumb 

One  joint  of  thumb      .     . 

First  finger 

Second  finger      .... 

Third  finger        .... 

Fourth  finger      .... 

Great  toe 

One  joint  of  great  toe 


Time 

Losses  in 

Days 


9,000 
10,500 

2,808 

2,196 

2,592 

1,845 

1,152 

540 

270 

414 

270 

225 

1.35 

342 

171 


Per  Cent  of 

Loss  OF 

Abm 


100.0 

78.2 

92.3 

65.7 

41.0 

19.2 

9.6 

14.7 

9.6 

8.0 

4.8 

12.2 

6.1 


Per  Cent  of 
Permanent 
Total  Dis- 
ability 


85.7 
100.0 

26.7 

20.9 

24.7 

17.6 

11.0 

5.1 

2.6 

3.9 

2.6 

2.1 

1.3 

3.3 

1.6 


This  schedule  supplies  a  series  of  constants  by  which  death 
and  permanent  injuries  may  be  weighted  in  terms  of  a  com- 
mon unit  —  time  lost  in  days  —  which  is  also  the  same  unit 
as  that  used  for  measuring  temporary  disabilities.  Multi- 
plying the  number  of  deaths  and  permanent  disabilities 
by  the  time  loss  determined  for  each  and  adding  the  prod- 


176 


STATISTICAL   METHODS 


ucts  to  the  days  lost  through  temporary  disabilities,  a  fig- 
ure is  obtained  wluch  represents  the  total  days  lost  from 
injuries.  Dividing  this  number,  representing  total  days 
lost,  by  the  number  of  full-time  workers  gives  as  a  quotient 
the  average  number  of  days  lost  per  full-time  worker.  This 
last  figure  may  be  called  the  accident  severity  rate,  since 
it  shows  the  burdensomeness  or  seriousness  of  the  accidents 
analyzed. 

The  whole  process  of  working  out  the  accident  severity 
rate  may  be  illustrated  as  follows :  Plant  A  operated 
4,200,000  man-hours  in  1915,  requiring  1400  full-time 
(300-day,  10-hour-per-day)  workers.  During  the  year 
324  accidents  occurred,  resulting  in  1  death  and  the  loss  of 
the  following  members :  2  arms,  1  foot,  5  thumbs,  25  first 
fingers,  while  the  290  temporary  disabilities  showed  a  time 
loss  of  2790  days.  Applying  the  time  losses  in  the  above 
table  to  these  data,  the  following  results  are  obtained  : 

Table  III.  —  Time  Losses  in  One  Plant 


Time  Loss  (in  Days) 

Per  case 

Total 

1  death       

9,000 

2,808 

1,845 

540 

414 

9,000 

2  arms 

5,616 

1,845 

1  foot      

5  thumbs 

2,700 

25  first  fingers 

290  temporary  disabilities 

10,350 
2,790 

Total 

32,301 

The  total  number  of  days  lost,  32,301,  divided  by  the 
number  of  full-time  workers,  1400,  gives  an  average  of  23 
days  per  full-time  worker.     This  is  what  is  here  called  the 


UNITS  OF  MEASUREMENTS 


177 


accident  severity  rate,  expressed  in  terms  of  daj^s.  The 
accident  frequency  rate  for  the  same  group  per  1000  full- 
time  300-day  workers  would  be  324 -^ =  231. 

■^  1000 

Illustrations  of  the  Use  of  Severity  Rates 
The  preceding  paragraphs  have  explained  the  mean- 
ing of  accident  severity  rates  and  the  method  by  which  they 
are  obtained.  The  significance  of  such  rates  in  their  practi- 
cal application  is  indicated  in  the  two  following  illustra- 
tions : 

In^  the  table  below  comparison  is  made  of  the  accident 
experience  for  a  year  of  the  iron  and  steel  industry,  as 
represented  by  a  large  plant,  and  of  the  machine-building 
industry,  as  represented  by  a  group  of  plants.  Frequency 
rates  and  severity  rates  are  shown  in  parallel  columns. 

Table  IV.  —  Accident  Rates  in  Steel  Manufacture  and  in 

Machine  Building 


Num- 
ber 

OF 

300- 
Day 

Work- 
ers 

Accident  Frequency  Rates 
(per  1000  300-Day  Workers) 

Accident  Severity  Rates  (Days 
Lost  per  300-Day  Worker) 

Industry 

Death 

Perma- 
nent 
disa- 
bility 

Tem- 
porary 
disa- 
bility 

Total 

Death 

Perma- 
nent 
disa- 
bility 

Tem- 
porary 
disa- 
bility 

Total 

Iron  and 
steel 
(1913) 

Machine- 
building 
(1912) 

7,562 
115,703 

1.9 
.3 

4.6 
3.6 

108.0 
114.1 

114.5 
118.0 

16.6 
2.9 

2.2 
1.6 

2.4 
1.1 

21.2 
5.6 

Examination  of  the  columns  giving  total  frequency 
rates  and  total  severity  rates  shows  that,  on  the  basis  of 
frequency,  the  machine-building  plants  were  more  haz- 
ardous  than  the   steel   plant  —  the  respective  rates  being 


N 


178 


STATISTICAL  METHODS 


118  as  against  114.5  per  1000  full-time  workers.  On  the 
basis  of  severity,  however,  the  steel  plant  was  almost  four 
times  as  hazardous  as  machine  building  —  the  days  lost 
per  full-time  worker  being  21.2  and  5.6,  respectively.  It 
is  clear  that  as  between  these  diametrically  opposite  show- 
ings of  the  relative  hazards  of  the  two  industries,  the  severity 
rates  offer  a  decidedly  more  accurate  measure  of  true  hazard. 
In  machine  building  there  is  opportunity  for  many  minor  in- 
juries, but  the  danger  of  serious  injury  is  much  less  than  in 
the  steel  industry.     The  severity  rate  brings  out  this  fact. 

The  second  illustration  shows  how,  over  a  period  of 
years,  within  the  same  establishment,  accident  severity  rates 
may  run  counter  to  accident  frequency  rates.  The  next 
table  gives  data  of  this  character.  It  shows  the  accident 
experience  of  a  large  steel  plant  over  a  period  of  four  years. 
The  plant  is  one  in  which  most  serious  attention  has  been 
devoted  to  the  prevention  of  accidents. 

Table  V.  —  Accident   Experience   of   a   Large  Steel  Plant; 

.      1910  TO  1913 


Number 

OF 

300- Day 

Workers 

Accident  Frequency  Rates 
(per  1000  300-Day  Workers) 

Accident  Severity  Rates  (Days 
Lost  per  300-Day  Worker) 

Year 

Death 

Perma- 
nent 

disabil- 
ity 

Tempo- 
rary 
disabil- 
ity 

Total 

Death 

Perma- 
nent 
disabil- 
ity 

Tempo- 
rary 
disabil- 
ity 

Total 

1910 
1911 
1912 
1913 

7642 
5774 
7396 
7562 

1.7 

1.6 

.7 

1.9 

4.3 
3.6 
6.5 
4.6 

127.5 
106.6 
146.3 
108.0 

133.5 

111.8 
153.5 
114.5 

15.3 

14.1 

6.0 

16.7 

2.4 
2.1 
5.5 
2.2 

2.2 
2.4 
2.8 
2.4 

19.9 
18.6 
14.3 
21.3 

Limiting  attention  to  the  columns  showing  total  rates,  it 
will  be  noted  that  in  1910  the  frequency  rate  was  133.5  per 
1000  300-day  workers  and  the  severity  rate  was  19.9  days 


UNITS  OF  MEASUREMENTS  179 

lost  per  300-day  worker.  The  next  year,  1911,  shows  a  de- 
crease in  both  frequency  and  severity.  In  1912,  however, 
there  was  a  marked  increase  in  frequency  —  from  111.8  to 
153.5  —  but  the  severity  rate  dropped  from  18.6  to  14.3. 
In  other  words,  accidents  had  considerably  increased  in  fre- 
quency, but  they  were  less  serious  in  their  total  results.  In 
1913  this  experience  was  reversed.  A  marked  reduction 
occurred  in  accident  frequency  —  from  153.5  to  114.5  — 
while  the  severity  rate  jumped  from  14.3  to  21.3.  In  other 
words,  the  year  1913,  instead  of  being  a  "good"  year,  as  it 
might  be  assumed  to  be  under  the  system  of  frequency  rates, 
was  the  worst  of  the  four  years  covered  by  the  table. 

These  illustrations  bring  up  certain  points  which  it  seems 
desirable  to  emphasize.  The  first  concerns  the  use  of  terms. 
Severity  rates  derived  in  the  manner  explained  are  expressed 
for  convenience  in  terms  of  work  days  lost.  For  instance, 
the  steel  plant  referred  to  above  is  represented  as  having  a 
severity  rate  in  1913  of  21.3  days  lost  per  300-day  worker. 
The  term  "days  lost"  as  thus  used  is  to  some  extent  a  statis- 
tical abstraction,  but  it  is  close  enough  to  concrete  fact  to 
permit  of  its  use  in  its  ordinary  sense  without  any  consider- 
able degree  of  error,  provided  that  the  weighting  scale  em- 
ployed is  a  reasonable  one.  In  any  case,  however,  the  real 
significance  of  severity  rates  is  in  their  use,  not  as  positive 
amounts  but  as  relative  amounts  as  indicating  the  relation 
between  groups.  Thus,  to  recur  to  the  example  of  the  steel 
plant  mentioned,  the  important  fact  is  that  the  severity  rate 
for  1913  shows  an  increase  over  that  for  1912  in  the  relation 
of  21.3  to  14.3. 

This  leads  to  a  second  point  which  cannot  be  too  much 
emphasized  :  The  fact  that  inasmuch  as  the  real  significance 
of  severity  rates  is  in  the  measurement  of  relative  hazards, 
the  character  of  the  weighting  scale  used  becomes  compara- 


180  STATISTICAL  METHODS 

lively  unimportant.  Thus,  by  changing  the  weights  in  the 
scale  offered  above,  the  resulting  severity  rates  may  be  con- 
siderably altered  in  their  positive  amounts,  but  unless  the 
changes  are  of  a  very  radical  character  the  relations  between 
the  rates  for  different  groups  will  remain  substantially  the 
same.  In  other  words,  it  is  desirable  to  have  the  scale  used 
as  accurate  as  possible,  but  the  fact  that  a  completely  accurate 
scale  cannot  be  devised  does  not  impair  the  value  of  accident 
severity  rating. 

Another  fact  deserving  emphasis  is  that  severity  rates 
have  a  very  important  advantage  over  frequency  rates,  in 
that  the  effect  of  errors  in  reporting  is  minimized.  Accident 
reports  are  probably  never  absolutely  complete,  and,  as  a 
rule,  the  completeness  of  reporting  is  in  direct  proportion  to 
the  seriousness  of  injury.  The  more  serious  the  injury  the 
greater  the  likelihood  of  its  being  reported.  Frequently  the 
reporting  of  minor  injuries  is  extremely  incomplete.  Inas- 
much as  the  accuracy  of  frequency  rates  depends  upon  the 
completeness  of  accident  reports,  and  as  all  accidents  have 
the  same  weight,  a  failure  to  report  any  considerable  number 
of  minor  accidents  renders  the  rates  obtained  of  very  little 
value.  Such  is  not  the  case  with  severity  rates.  Here  the 
disabilities  are  weighted  according  to  their  importance,  and 
a  large  group  of  minor  disabilities  has  comparatively  little 
effect  upon  the  derived  severity  rate.  Thus,  from  the  ma- 
terial available  concerning  the  iron  and  steel  industry,  it  is 
estimated  that  the  total  exclusion  of  all  disabilities  of  less 
than  two  weeks  will  rarely  diminish  the  total  severity  rate 
for  that  industry  as  much  as  1  per  cent,  whereas  such  an  ex- 
clusion would  diminish  frequency  rates  as  much  as  60  per 
cent.  In  the  machine-building  industry,  according  to  data 
collected  by  the  bureau  for  that  industry,  the  corresponding 
percentages  are  7  and  70. 


UNITS  OF  MEASUREMENTS  181 

Growing  Recognition  of  the  Importa7ice  of  Severity  Rating.  — 
It  is  safe  to  say  that  all  who  have  been  concerned  with  acci- 
dent studies  and  accident-prevention  work  have  felt  the 
need  of  some  system  of  severity  rating,  such  as  that  developed 
in  the  present  chapter.  The  International  Association  of 
Industrial  Accident  Boards  and  Commissions  has  recognized 
the  importance  of  the  subject  and  through  its  committee  on 
statistics  has  the  matter  now  under  consideration.  The  com- 
mittee has  unanimously  approved  the  principle  of  severity 
rating.  The  discussion  now  concerns  simply  the  scheme  of 
rating  to  be  adopted.  The  one  worked  out  and  applied  in 
the  present  report  is  believed  to  meet  the  necessary  tests  of 
a  simple,  workable  system.  It  has  already  been  approved 
and  adopted  by  a  number  of  important  establishments. 

Use  of  Rates  in  the  Study  of  Accident  Causes.  —  Frequency 
and  severity  rates,  as  above  described,  may  be  applied  to 
the  measurement  of  accident  causes.  .  ,  .  Inasmuch  as  the 
computation  of  accident  rates  according  to  causes  is  some- 
what novel,  a  brief  preliminary  description  of  the  method 
used  is  desirable. 

For  any  plant,  department,  occupation,  or  other  industrial 
group  for  which  the  amount  of  employment  and  the  number 
of  accidents  are  known,  an  accident  rate  may  be  computed. 
This  total  rate  may  then  be  apportioned  among  various 
causes  responsible  for  the  accidents.  For  example,  in  a  group 
of  blast  furnaces,  with  a  total  frequency  rate  of  200  cases  per 
1000  full-time  workers,  it  was  found  on  analysis  that  58  of 
each  200  cases  were  due  to  molten  metal,  27  to  handling  tools 
and  objects,  leaving  115  as  due  to  miscellaneous  causes.  The 
frequency  rate  of  molten  metal  as  a  cause  of  accident  in  these 
blast  furnaces  was,  therefore,  58  per  1000  workers ;  of  han- 
dUng  tools,  27  per  1000  workers,  etc. 

The  value  of  such  rates  to  the  safety  man  is  clearly  evi- 


182  STATISTICAL   METHODS 

dent.  They  indicate,  in  the  example  given,  that  molten 
metal  was  the  most  important  single  cause  of  accident  in 
blast  furnaces,  and  the  one  to  which  especial  attention  must 
be  directed. 

In  the  case  just  cited,  the  department  was  taken  as  the 
unit,  the  rates  being  based  on  the  total  employment  for  the 
department.  If  a  smaller  unit,  such  as  the  occupation,  be 
used  as  a  basis,  the  rates  would  be  based  on  the  amount  of 
employment  in  the  individual  occupation.  In  the  case  of 
the  above  group  of  blast  furnaces  it  was  possible  to  isolate 
certain  important  occupations,  to  draw  accident  rates  for 
each,  and  to  apportion  such  rates  among  the  different  causes. 
Thus  it  was  found  that  while  the  frequency  rate  for  the  blast- 
furnace department  as  a  whole  was  200  per  1000  workers,  the 
frequency  rate  for  the  "cast-house  men"  was  380  per  1000 
workers  employed  in  that  occupation.  Analysis  of  causes 
of  accidents  showed  this  total  of  380  to  be  made  up  of  a  rate 
of  201  cases  from  molten  metal,  43  from  falling  objects,  and 
136  from  "miscellaneous  causes." 

These  occupational  cause  rates  are  even  more  valuable  to 
the  safety  man  than  are  the  preceding  departmental  cause 
rates,  as  they  indicate  still  more  precisely  the  points  of  great- 
est hazard.  Unfortunately  it  is  not  often  possible  to  use  the 
occupation  as  a  unit  as  plants  rarely  keep  records  of  employ- 
ment in  such  detail,  and  even  if  this  is  done  the  number  of 
employees  in  the  occupation  is  often  so  small  as  to  be  incon- 
clusive. 

These  cause  rates,  whether  based  on  the  department,  the 
occupation,  or  any  other  group,  are  true  accident  rates, 
analogous  to  the  death-rates  by  disease  as  used  in  mortality 
studies.  In  such  studies  it  is  customary  to  divide  the  general 
death-rate  for  a  community  into  specific  rates  for  the  various 
diseases  causing  death.     Thus  a  general  death-rate  of  20  per 


UNITS  OF  MEASUREMENTS  183 

1000  for  a  given  city  may  be  made  up  of  the  following  specific 
rates  :  tuberculosis  5,  typhoid  fever  2,  other  causes  13.  These 
rates,  it  may  be  noted,  measure  the  real  prevalence  of  the 
several  diseases  in  a  way  that  percentages  cannot  do.  Thus 
in  the  year  noted,  deaths  from  tuberculosis  constituted  25 
per  cent  of  all  deaths  (5  out  of  20).  Suppose  that  in  the  fol- 
lowing year  a  typhoid  epidemic  increased  the  typhoid  rate 
from  2  to  7  and  thus  caused  the  general  rate  to  jump  from  20 
to  25,  the  tuberculosis  death-rate  of  5  per  1000  would  re- 
main as  before,  but  expressed  in  percentages  tuberculosis 
would  have  decreased  from  25  per  cent  (5  out  of  20)  to  20 
per  cent  (5  out  of  25)  as  a  cause  of  death.  The  percentage 
change  would  suggest  a  great  decrease  in  the  tuberculosis 
hazard,  which,  however,  as  the  rate  accurately  indicates 
(5  per  1000),  remained  absolutely  stationary.  The  attempt 
to  study  causes  of  death  by  means  of  percentage  figures 
is  thus  liable  to  be  entirely  misleading.  Rates,  on  the 
other  hand,  offer  an  absolutely  reliable  measure.  This  is 
equally  true,  and  for  the  same  reasons,  in  the  study  of 
accident  causes. 

The  above  illustrations  of  the  use  of  cause  rates  were 
limited,  for  the  sake  of  simplicity,  to  frequency  rates.  Sever- 
ity rates  can,  of  course,  be  applied  in  precisely  the  same  way 
and  with  even  more  valuable  results,  inasmuch  as  severity 
rates,  as  pointed  out  above,  are  a  truer  measure  of  accident 
hazard  than  are  frequency,  rates. 

Use  of  Rates  in  the  Study  of  Nature  of  Injury,  Labor  Re- 
cruiting, and  Other  Factors.  —  Frequency  and  severity  rates 
may  also  be  applied  to  the  study  of  the  nature  of  injury  in 
precisely  the  same  way  as  they  may  be  applied,  as  described 
above,  to  the  analysis  of  accident  causes.  Thus,  in  a  group 
of  blast  furnaces,  with  a  total  frequency  rate  of  191  cases  per 
1000  full-time  workers,  it  was  found  on  analysis  that  89  out 


184  STATISTICAL   METHODS 

of  each  191  cases  resulted  in  bruises  and  lacerations,  45  cases 
in  burns,  10  cases  in  fractures,  and  47  cases  in  various  other 
injuries.  This  being  so,  it  is  quite  correct  to  say  that  bruises 
and  lacerations  in  these  blast  furnaces  had  a  frequency  rate 
of  89  cases  per  1000  workers,  burns  a  frequency  rate  of  17 
cases,  and  so  on.  These  are  true  rates,  with  the  same  su- 
periority to  percentages  as  a  measure  of  the  frequency  and 
severity  of  injuries  of  various  kinds  as  was  noted  to  be  true 
in  the  case  of  accident  causes. 

Moreover,  outside  the  accident  field  proper,  there  are  many 
collateral  subjects  to  which  the  rate  method  may  be  very 
profitably  applied.  An  important  instance  of  this  is  the 
employment  of  new  men.  By  relating  the  number  of  300- 
day  workers  to  the  number  of  new  men  hired  during  a  given 
time,  a  rate  is  obtained  which  may  be  referred  to  as  the  "  labor 
recruiting  rate."  There  is  an  interesting  and  important  con- 
nection between  this  "labor  recruiting"  rate  and  the  accident 
rate.  Usually,  the  taking  on  and  use  of  new  men  has  a  marked 
tendency  to  increase  the  accident  occurrence  of  a  plant. 

In  a  similar  manner,  rates  based  on  the  amount  of  employ- 
ment may  be  derived  for  production,  labor  costs,  sickness, 
and  many  other  subjects. 

REVIEW 

1.  Is  the  author's  statement  of  the  purpose  of  conducting  studies 
of  accidents  always  true?     Suggest  others. 

2.  What  answer  would  you  give  to  the  wTiter's  question,  "  What 
is  to  be  regarded  as  an  industrial  accident  for  the  purpose  of  statis- 
tical study?"  Can  a  single  definition  be  given?  What  relation 
has  the  definition  to  the  purpose  ?     Illustrate. 

3.  What  are  tabulatable  accidents,  diseases,  injuries?  What 
purpose  is  kept  in  mind  in  deciding  this  question? 

4.  What  denominators  have  been  chosen  in  expressing  the 
coefficient  "industrial  accident  rate"?     What  are  their  respective 


UNITS  OF  MEASUREMENTS  185 

merits?     What  is  meant  by  a   "full-time  worker"?     How  is  the 
unit  calculated  ?     Is  this  a  composite  unit  ? 

5.  What  is  the  method  adopted  for  estimating  the  "  man- 
hours"  worked  and  the  number  of  "full-time  workers"? 

6.  In  summary,  explain  the  expression  "accident  frequency 
rate." 

7.  Explain  the  expression  "  accident  severity  rate." 

8.  What  are  the  available  statistical  tests  of  severity?  Are 
they  all  equally  good?  Do  they  differ  for  different  purposes? 
Do  the  interests  of  the  injured,  the  employee,  and  the  public  coin- 
cide in  establishing  such  tests? 

9.  How  is  the  "lost-time"  test  applied  in  cases  of  fatalities, 
permanent  total  disabilities,  permanent  partial  disabilities?  Does 
this  method  appear  to  you  to  be  scientific?  Why?  Of  what 
statistical  value  in  this  connection  is  the  similarity  of  the  time 
allowance  disabilities  in  the  various  States  ? 

10.  Calculate  the  accident  severity  rate  for  the  following  ex- 
perience, using  the  schedule  of  time  losses  given  on  page  175. 

Man-hours  operated  per  year     ....     5,360,000 

Full-time  workers 1,800 

Accidents  —  one  year. 

3  deaths. 

1  loss  of  arm. 

1  loss  of  leg. 

1  loss  of  eye. 
60  loss  of  first  joint  of  thumb. 
300  temporary  disabilities,  resulting  in  loss  of  2670  days. 

11.  What  condition  may  explain  differences  in  the  accident  fre- 
quency rate  between  establishments,  plants,  or  industries?  What 
different  conditions,  if  any,  explain  different  accident  severity  rates  ? 

12.  Severity  rates  are  important  "not  as  positive  amounts  but 
as  relative  amounts."  Explain.  What  is  the  purpose  of  severity 
rates  in  the  mind  of  the  writer  in  making  this  statement?  Might 
the  statement  be  untrue  for  other  purposes  ?     Illustrate. 

13.  What  relation  has  the  error  in  reporting  accidents  to  fre- 
quency rates,  to  severity  rates  ?  What  sorts  of  error  has  the  author 
in  mind?     Might  other  sorts  affect  the  problem  differently? 

14.  Can  you  think  of  any  occasions  when  accident  frequency 
rates  would  be  of  greater  significance  than  severity  rates  ? 


186  STATISTICAL  METHODS 

15.  Does  the  following  statement  demonstrate  the  superiority 
of  severity  over  frequency  rates?  "Thus,  from  material  available 
it  is  estimated  that  the  total  exclusion  of  all  disabilities  of  less  than 
two  weeks  will  rarely  diminish  the  total  severity  rates  for  that 
industry  as  much  as  1  per  cent,  whereas  such  an  exclusion  would 
diminish  frequency  rates  as  much  as  60  per  cent." 

16.  Contrast  the  rate  and  percentage  methods  of  stating  causes 
of  deaths.  What  relation  has  the  rule  of  the  text,  "always  relate 
tilings  to  the  conditions  that  produce  them"  to  do  with  this  dis- 
cussion? 

17.  Write  a  single  paragraph  summarizing  the  above  article 
and  showing  its  relation  to  the  general  topic  Statistical  Units  of 
Measurements. 


Some  Illogical  Units  in  Railway  Statistics  ^ 

One  of  the  most  fascinating  and  important  parts  of  the 
statistician's  work  is  the  development  of  the  best  miits  or 
bases  of  statistical  judgment.  On  this  side  our  pubUshed 
railway  statistics  compare  favorably  with  any,  but  none  seem 
above  criticism,  the  principle  of  coherence  is  so  commonly 
violated.  To  be  true  and  logical  the  unit  must  be  one  based 
on  a  cause-and-effect  relationship,  that  is,  it  must  vary  with 
the  phenomena  of  whose  summation  it  is  an  index,  or  must 
indicate  the  relation  between  worker  and  work  done. 

To  illustrate  this  in  a  negative  way  take  the  much  over- 
worked train-mile.  If  it  is  to  be  our  unit  of  service,  simply  as 
an  index  of  utility  rendered,  it  need  only  have  the  quality  of 
varjdng  in  proportion  to  utility  consumed  by  us.  Or  if  it 
is  to  have  a  deeper  significance,  entering  the  rate  question 
through  the  door  of  cost,  it  must  meet  the  test  of  varying 
with  costs,  —  of  indicating   the   relation   between  tractive 

*  Adapted  with  permission  from  Haney,  L.  H.,  "  Railway  Statistics,"  in 
Quarterly  Publications  of  the  American  Statistical  Association,  September, 
1910,  Vol.  12,  pp.  208-211. 


UNITS  OF  MEASUREMENTS  187 

power  (the  worker)  and  tonnage  movement  (the  work  done). 
What  is  the  result? 

As  to  service  one  first  reflects  that  the  train  is  so  lacking 
in  homogeneity  that  the  miles  it  makes  are  sadly  lacking  in 
uniform  value.  By  the  time  one  has  asked  how  many  cars 
there  were  in  that  train  ?  what  kind  of  cars  —  gondolas,  box, 
tank,  or  stock?  at  what  rate  of  speed  did  it  move?  was  it 
going  in  the  direction  of  prevailing  traffic  ?  was  it  a  train  of 
twenty  years  ago,  made  up  of  30-ton  wooden  cars  and  drawn 
by  a  little  "American-type"  locomotive,  or  one  of  to-day 
with  50-ton  steel  cars  and  a  Mallet  locomotive  ?  —  by  this 
time  one  finds  that  one  train-mile  is  so  different  from  another 
that  he  hesitates  to  accept  it  as  a  standard.  Anyhow,  he 
reflects,  what  one  wants  from  the  railway  is  not  train-miles 
but  tons  (of  goods)  moved  from  A  to  B, — ton-miles,  for 
short.  And  while,  to  be  sure,  there  will  be  on  the  average 
some  relation  between  trains  a,nd  tons  moved,  it  is  not  neces- 
sary or  close  enough. 

This  unit,  however,  is  more  often  used  as  a  cost  index.  .  .  . 
But,  passing  over  the  difficulties  of  defining  a  train,  it  may  be 
said  that  trains  consist  of  one  or  more  locomotives  and  a 
number  of  cars.  In  this  compound  aggregate  some  costs 
vary  with  the  number  and  type  of  locomotives  (wages  of  en- 
gine crew  and  say  30  per  cent  of  fuel),  having  no  connection 
with  the  number  of  cars,  or  the  "train."  Others  are  pecuhar 
to  the  cars.  Finally  there  is  a  remainder  that  belongs  to  the 
train  as  such  (balance  of  fuel,  wages  of  train  crew,  etc.). 
Obviously  the  train-mile  will  serve  as  a  homogeneous  unit 
of  cost  only  to  the  extent  that  the  factors  peculiar  to  locomo- 
tive expense  and  car  expense  are  either  negligible  or  capable 
of  being  averaged.  Locomotive  expenses  are  far  from  being 
noghgible.  Therefore  the  value  of  the  train-mile  unit  partly 
depends  upon  an  assumed  average  cost  of  locomotive-miles, 


188 


STATISTICAL   METHODS 


having  its  weakness.  As  a  unit  of  cost,  perhaps  the  chief 
difficulty  comes  in  the  varying  number  and  kind  of  loco- 
motives embraced  in  the  train.  Then  there  are  the  varying 
''train  resistances,"  depending  on  speed,  number  of  cars, 
grades  and  curves,  etc. 

The  theoretical  lack  of  relation  between  train-miles  and 
expense  of  performance  is  illustrated  by  the  following  rela- 
tive figures : 


Year 

Train-miles 

Average  Cost 
PER  Train- 
mile 

qperatinq 
Expenses 

Ton-miles 

1897 
1902 
1907 

100 
117 
146 

100 
127 

158 

100 
148 
232 

100 
165 
242 

To  the  writer  it  seems  that  the  usefulness  of  the  train-mile 
unit  varies  somewhat  according  as  it  is  applied  to  the  pas- 
senger or  the  freight  service,  —  suggesting,  by  the  way,  that 
the  differences  between  freight  and  passenger  service  can- 
not be  removed  through  this  agency.  Considerations  other 
than  cost  play  so  great  a  part  in  passenger  operation  that, 
from  the  last  viewpoint,  it  has  small  importance ;  while  in 
the  freight  service,  if  sufficient  interpretative  data  concern- 
ing locomotive  miles,  gross  and  net  tons  per  train,  etc.,  are 
utiUzed,  it  may  be  of  considerable  service.  From  the  service 
viewpoint  the  situation  is  reversed,  for  in  the  passenger  service 
train-miles  seem  to  approach  more  nearly  a  necessary  rela- 
tion to  social  service  than  in  the  freight  service.^  Considered 
as  an  independent  unit  the  locomotive-mile  is  open  to  similar 
objections. 

1  As  an  index  of  service  between  particular  points  passenger  train-miles 
may  be  of  little  value,  as  they  would  include  trains  which  did  not  stop  at 
one  or  both  the  points,  perhaps,  etc. 


UNITS  OF  MEASUREMENTS  189 

But  without  further  elaboration  the  conclusion  may  be 
drawn  that  per-mile  units  have  been  too  largely  depended 
upon  in  our  railway  statistics.  Phenomena  which  do  not 
have  a  reasonably  close  relation  to  miles  should  not  be  meas- 
ured in  miles.  We  need  more  careful  analysis  of  essential 
relations  and  variety  of  units  each  adapted  to  the  particular 
case.  A  similar  weakness  might  be  illustrated  from  our  acci- 
dent statistics,  where  the  occasion  is  made  to  serve  as  a  cause 
in  some  columns. 

Take  the  case  of  locomotive-miles.  As  a  matter  of  fact 
locomotive-hours  would  mean  more ;  for,  taking  all  expenses 
connected  with  locomotive  operation  into  consideration,  it 
will  be  found  that  cost  varies  more  with  time  than  distance, 
— interest,  certain  repairs,  fuel,  etc.  But  hours  alone  cannot 
measure  locomotive  performance ;  there  must  be  some  re- 
lation with  product.  What  is  produced  is  "draw-bar  pull," 
or  tractive  power,  so  that  to  really  judge  efficiency  of  loco- 
motives from  either  the  cost  or  the  service  viewi^oint  a  unit 
of  tractive  power  must  be  used.  Accordingly  a  recent  report 
of  the  Committee  on  Conducting  Freight  Transportation  of 
the  Association  of  Transportation  and  Car  Accounting  Officers 
recommends  the  tractive-power-hour  for  use  by  the  railways. 
Thus  allowance  would  be  made  for  different  tractive  powers, 
delays  between  terminals,  etc. 

Perhaps  the  true  meaning  of  these  different  units  appears 
most  clearly  when  the  railway  is  imagined  as  a  great  organism 
whose  work  is  performed  through  a  series  of  concomitant  but 
subordinate  activities.  Each  department  has  its  function 
and  its  product,  but  that  product  may  be  the  raw  material  for 
another  department  which  carries  the  work  a  step  farther,  — 
perhaps  to  its  consummation  in  the  final  transportation  prod- 
uct. Thus  in  this  hierarchy  various  units  may  be  appro- 
priate for  various  departments  according  to  their  contribu- 


190  STATISTICAL   METHODS 

tions.  If  it  is  the  function  of  the  terminal  force  to  move  cars, 
"cars  handled"  is  the  appropriate  unit  —  of  cost,  at  least. 
Obviously  the  ton-mile  is  not  a  unit  applicable  to  the  work  of 
the  mechanical  department ;  that  department  directly  fur- 
nishes tractive  power  on  the  one  hand  and  carrying  capacity 
—  cars,  trains  —  on  the  other.  And  so  on.  The  ton-mile 
caps  the  climax.  But  when  one  desires  to  judge  the  particu- 
lar and  peculiar  efficiency  of  a  subordinate  part  of  the  mech- 
anism its  peculiar  work  must  determine  its  unit.  Just  as 
in  the  case  of  the  cost  viewpoint  some  question  might  be 
raised  as  to  how  far  our  government  should  go,  so  here  it 
would  be  necessary  to  ask  how  intensive  a  regulation  is  de- 
sired to  determine  how  many  subordinate  units  are  necessary. 
Not  only  is  the  per-mile  average  overworked,  but  also  the 
simple  arithmetic  average  is  so  used  as  to  be  a  very  reUc  of 
barbarism.  It  is  hardly  necessary  to  point  out  its  limita- 
tions. As  to  the  particular  point  now  involved  it  fails  in 
not  indicating  the  weight  and  distribution  of  the  factors 
averaged.  Why,  then,  not  make  some  practical  use  of  such 
well-known  statistical  devices  as  the  weighted  average,  the 
mode,  and  the  median  ?  No  average  wage  for  all  employees  is 
given  ;  a  weighted  average  would  be  good.  The  mode  would 
be  best  for  the  average  trip  and  haul.  Several  shortcomings 
in  the  most  used  units  of  railway  statistics  might  be  capable 
of  partial  remedy  by  the  adoption  of  more  illuminating  aver- 
ages, if  only  the  returns  were  made  more  analytically. 

REVIEW 

1.  In  what  way  is  the  discussion  of  units  in  this  selection  related  to 

(1)  the  purpose  for  which  the  units  are  used? 

(2)  the  distinction  between  "simple"  and  "complex"  units? 

(3)  statistical  basis  for  measuring  costs  ;  for  measuring  "  service  "  ? 

2.  What  alternative  units  to  "train-mile"  are  suggested  and  for 
-what  purpose  ? 


CHAPTER   IV 

ILLUSTRATIONS  OF  METHODS   IN   COLLECTING 
STATISTICAL  DATA 

Study  of  Wages  —  Method  ^ 

With  a  view  of  supplementing  the  returns  presented  in 
the  Report  on  Manufactures  of  the  Twelfth  Census,  in  re- 
gard to  earnings  of  employees  making  a  more  precise  classi- 
fication of  wages,  the  Census  Office  in  September,  1901,  de- 
termined to  undertake  a  special  investigation.  .  .  . 

1.  Scope  and  Principles  of  the  Investigation. — Owing  to 
the  limitations  of  time  and  the  lack  of  established  methods 
of  procedure  which  could  be  confidently  rehed  upon,  it  was 
determined  to  limit  the  scope  of  the  special  wage  inquiry  to  a 
few  industries,  and  to  confine  the  treatment  of  the  data 
recorded,  as  far  as  possible,  to  a  single  form.  As  the  method 
adopted  by  the  Twelfth  Census  for  calculating  the  number 
of  employees  sharing  in  the  total  reported  earnings  differs 
from  that  adopted  in  1890,  so  that  the  data  obtained  for  these 
two  years  are  not  strictly  comparable,  it  was  determined  to 
extend  the  inquiry  to  1890  as  well  as  1900.  The  principles 
controlling  the  investigation  are,  briefly,  as  follows : 

(1)  Restriction  of  the  inquiry  to  a  few  stable  and  normal 
industries. 

(2)  Collection  of  actual  rates  of  wages. 

1  Adapted  with  permission  from  "Employees  and  Wages,"  Twelfth  Census 
of  the  United  States,  Takeninthe  Year  1900, 1903,  Davis  R.  Dewey,  "Report," 
pp.  xiv-xx. 

191 


192  Statistical  methods 

(3)  Classification  of  employees  by  rates  of  wages,  and  as 
far  as  possible  by  occupations. 

2.  Wages  as  Measured  by  Earnings  and  by  Rates.  —  There 
are  two  statistical  measures  used  in  representing  the  reward 
of  labor,  commonly  termed  wages :  First,  earnings  or  the 
income  received  in  a  given  period  of  time,  irrespective  of  the 
number  of  hours  or  days  actually  worked ;  second,  rates 
which  express  the  amount  paid  for  work  during  a  given  unit 
of  time,  as  an  hour,  a  day,  a  week,  etc.  Each  of  these  meas- 
ures is  of  value  to  the  student  of  economic  conditions.  The 
first  is  the  compeiLsation  actually  received  in  a  given  period 
of  time  without  regard  to  unemplojTnent,  occasioned  by  ill- 
ness, strikes,  industrial  depression,  or  other  causes ;  the  second 
is  the  earning  power  in  a  given  unit  of  time.  If  employment 
were  regular  and  constant,  these  two  methods  might  be  used 
interchangeably  —  rates  could  be  calculated  from  earnings 
and  earnings  from  rates.  Employment  is  not  regular  and 
constant,  however,  because  of  interruptions  due  to  either 
individual  or  industrial  conditions.  Of  the  two  measures, 
at  the  present  stage  of  economic  conditions,  earnings  are  of 
the  more  interest ;  but  to  ascertain  the  earnings  of  indi\ddual 
employees  for  any  period  of  time  greater  than  a  week  is  al- 
most impossible.  The  earnings  as  given  in  the  Report  on 
Manufactures  of  the  Twelfth  Census,  are  for  a  mass  of  work- 
men whose  identity  cannot  be  preserved  from  week  to  week 
or  month  to  month;  as  has  been  seen,  the  number  of  em- 
ployees, among  whom  the  total  earnings  are  di\'ided,  is  an 
average  number,  and  to  that  extent  the  resulting  computa- 
tions are  only  approximate. 

The  earnings  of  even  a  single  week  may  be  misleading, 
especially  where  no  record  of  time  is  kept  by  the  manage- 
ment. The  establishment  may  have  shut  down  for  a  portion 
of  a  day ;  work  in  a  particular  department  of  a  mill  may  have 


COLLECTING   STATISTICAL   DATA  193 

been  slack,  although  as  a  whole  the  establishment  was  running 
full  time ;  or  there  may  have  been  an  exceptional  amount 
of  illness  at  one  period  as  compared  with  another. 

The  present  inquiry,  therefore,  is  concerned  primarily 
with  rates,  earnings  being  used  only  when  the  data  in  regard 
to  rates  are  defective  or  require  further  interpretation.  Sta- 
tistics of  rates,  however,  reveal  only  a  part  of  the  picture; 
the  complete  situation  can  be  described  only  when  the  amount 
of  time  worked  for  at  least  a  year  is  known,  and  even  this 
should  be  supplemented  by  a  knowledge  of  prices  in  order  to 
determine  the  value  of  the  compensation  as  measured  in 
the  commodities  purchased.  These  latter  inquiries  must 
be  supplementary ;  there  is  no  way  to  combine  in  one  in- 
quiry all  the  elements  for  a  complete  presentation  of  wage 
statistics. 

3.  The  Schedule  of  Questions.  —  In  order  to  carry  out  the 
purpose  of  this  inquiry  the  special  schedule  on  the  follow- 
ing page  was  drafted. 

4.  Sections  of  the  Country  Covered.  —  The  work  of  secur- 
ing the  data  called  for  by  this  schedule  was  intrusted  to  spe- 
cial agents  who  were  instructed  to  visit  certain  manufactur- 
ing establishments  in  the  respective  territories  to  which  they 
were  assigned,  care  being  taken  to  select  essentially  manu- 
facturing localities.  This  restriction,  together  with  lack  of 
sufficient  time  to  make  a  more  thorough  canvass,  explains 
the  absence  of  returns  from  the  States  classed  in  the  census 
reports  as  "Western";  but  although  the  report  is  to  that 
extent  deficient,  affording  no  basis  for  a  comparison  of  wages 
between  that  section  and  other  parts  of  the  country,  it  is 
believed  that  the  main  results  of  the  investigation  are  not 
thereby  seriously  impaired.  Fortunately,  returns  were  se- 
cured for  a  few  industries  for  the  Pacific  States. 

5.  Industries  Investigated.  —  The  inquiry  was  limited  to 


194 


STATISTICAL   METHODS 


m 

H 

m 

^ 

H 

< 

a 

H 

< 

yj 

^ 

Q 

lil 

^ 

o 

t-H 

^ 

CO 

P 

M 

<! 
« 

H 

r, 

1 

O 
r/l 

t> 

'A 

Q 

rtJ 

< 

H 

13 

^ 

U 

M 

K 

>J 

H 

<! 

1-4 

Oh 

H 

a 

eg 


a 


d 

o 
o 


73 
O 
c3      o 

a  1=1 

O   Pi 


■   5 

<U     ft 

>i  ill 

a  o 

helpers,  note 
same  and 
separate 

• 

Piece 

or  day 

work 

Hours 
of  labor 
per  week 

Hours 
of  labor 
per  day 

Per  hour, 
day, 
week, 

month,  or 
year 

a 

O 

< 
W 

b, 
O 

H 
E< 
<! 

S 
a 

O 

"o 
Q 

Sex 

under 

16  years 

Sex 
16  years 
and  over 

Number 

of 
persons 

z 
o 

H 

b 
o 
o 

o 

COLLECTING   STATISTICAL   DATA  196 

34  industries,  nearly  all  of  a  permanent  character,  which 
are  not  violently  affected  by  seasonal  influences.     They  are : 

Agricultural  implements.  Glass. 

Bakeries.  Iron  and  steel. 

Breweries.  Knitting  mills. 

Brickyards.  Lumber  and  planing  mills. 

Candy.  Paper  mills. 

Car  and  railroad  shops.  Pianos. 

Carpet  mills.  Potteries. 

Chemicals.  Printing. 

Cigars.  Rubber. 

Clothing.  Shipyards. 

Collars  and  culTs.  Shoes. 

Cotton  mills.  Silk  mills. 

Distilleries.  Slaughtering. 

Dyeing  and  finishing  textiles.     Tanneries. 

Flour  mills.  Tobacco. 

Foundries  and  metal  working.  Wagons  and  carriages. 

Furniture.  Woolen  mills. 

In  grouping  the  returns  by  industries,  the  plan  of  classifi- 
cation adopted  by  the  division  of  manufactures  of  the  Twelfth 
Census,  in  which  product  is  the  determining  factor,  has  in 
the  main  been  followed  here.  For  the  purpose  of  analyzing 
wages  in  specific  occupations  this  is  not  a  logical  classification, 
as  there  is  no  inherent  relation  between  products  and  oc- 
cupations ;  some  classification,  however,  is  necessary  in  order 
to  cover  the  most  important  branches  of  industry,  and  the 
grouping  by  manufactured  products  is  chosen  as  the  most 
serviceable  method  available.  Almost  the  only  change  made 
in  this  report  in  the  regular  census  industry  names  is  a  shght 
alteration  of  the  wording  to  make  them  more  definitely  de- 
scriptive of  the  establishments  from  which  pay  rolls  have 


196  STATISTICAL   METHODS 

been  secured.  Thus,  the  census  classification  is  "tobacco, 
cigars,  and  cigarettes,"  but  since  no  cigarette  factories  are 
covered  in  the  present  investigation  the  industry  is  called 
"cigars."  "Breweries"  is  used  instead  of  " liquors,  malt "  ; 
"tanneries"  instead  of  "leather,  tanned,  curried,  and 
finished";  and  other  similar  changes  in  wording  are  made. 
But  in  all  cases  establishments  are  referred  to  classes  corre- 
sponding to  those  shown  in  the  General  Census  Reports, 
except  where  differences  in  product  would  thereby  be  shown 
in  too  great  detail.  Thus,  in  the  Report  on  Manufactures 
of  the  Twelfth  Census,  brass  foundries,  iron  foundries,  ma- 
chine shops,  bicycle  factories,  sewng-machine  factories, 
typewriter  factories,  etc.,  were  given  separate  classes;  but 
for  the  purpose  of  securing  the  statistics  of  wages  it  is 
beUeved  that  the  returns  can  be  safely  simplified  by  combin- 
ing all  these  as  "foundries  and  metal  working,"  thus  obtain- 
ing numbers  of  employees  engaged  in  the  same  occupations 
sufficiently  large  to  justify  extended  study  of  the  results. 

The  classification  for  industries  is  made  by  establishments 
as  a  whole.  It  has  not  been  considered  feasible  to  attempt 
to  subdivide  establishments  into  departments,  except  in  the 
case  of  a  few  textile  establishments,  where  the  books  are  so 
kept  that  the  dyeing  and  finishing  departments  can  be 
separated.  This  classification  of  establishments  is  presented 
in  four  general  groups  made  up  of  the  34  separate  industries. 
No  attempt  has  been  made  to  consolidate  the  statistics  in 
these  four  groups,  but  in  the  discussion  and  arrangement  of 
the  statistics  the  similarities  within  some  of  these  general 
classes  have  been  helpful.  The  industries  comprised  in  the 
four  general  groups  are  as  follows  : 

(1)  Textile  mills,  which  comprise  reports  from  carpet  mills, 
cotton  mills,  dyeing  and  finishing  establishments,  knitting 
mills,  silk  mills,  and  woolen  mills. 


COLLECTING  STATISTICAL  DATA  197 

(2)  Factories  engaged  principally  in  woodworking  include 
agricultural  implement  factories,  furniture  factories,  lumber 
and  planing  mills,  piano  factories,  and  wagon  and  carriage 
factories. 

(3)  Metal-working  establishments  comprise  car  and  rail- 
road shops,  foundries  and  metal-working  establishments, 
iron  and  steel  mills,  and  shipyards. 

(4)  Miscellaneous  industries  reported  include  bakeries, 
breweries,  brickyards,  candy  factories,  chemical  factories, 
cigar  factories,  clothing  factories,  collar  and  cuff  factories, 
distilleries,  flour  mills,  glass  factories,  paper  mills,  potteries, 
printing  establishments,  rubber  factories,  shoe  factories, 
slaughtering  establishments,  tanneries,  and  tobacco  factories. 

Certain  resemblances  in  materials  or  products  might  serve 
as  a  basis  for  grouping  some  of  the  industries  in  the  last  class ; 
thus,  for  instance,  "bakeries,"  "candy"  factories,  "flour 
mills,"  and  "slaughtering"  establishments,  all  furnish  food- 
stuffs ;  but  similarity  of  product  is  no  reason  why  they  should 
be  grouped  in  wage  statistics.  It  is  not  to  be  expected  that 
two  establishments  exactly  alike  as  regards  labor  conditions 
can  be  found,  but  it  is  believed  that  within  the  industries 
as  finally  determined,  interchange  of  labor  can  be  accom- 
plished to  a  considerable  extent ;  that  is,  each  industry  rep- 
resents a  group  of  establishments  making  similar  products 
by  related  though  diversified  processes  so  that  the  labor  em- 
ployed in  one  establishment  is  comparable  -with  that  in  an- 
other. 

The  three  important  steps  in  wage  investigation  are  collec- 
tion of  data,  tabulation,  and  analysis. 

1.  Pay  Rolls  Copied.  —  In  the  collection  of  data  it  was  de- 
cided to  rely  upon  the  pay  rolls  of  employers ;  only  in  this 
way  is  it  possible  to  secure  returns  from  all  the  constituent 
elements  in  a  given  establishment,  for  it  is  manifestly  im- 


198  STATISTICAL   METHODS 

practicable  to  visit  each  separate  employee  to  obtain  a  per- 
sonal return ;  and,  moreover,  it  is  clear  that  the  pay  roll  of 
the  employer  states  in  the  most  precise  form  available  the 
actual  rate  of  pay  of  each  employee.  This  method  removes 
all  opportunity  for  either  exaggeration  or  underestimation 
and  also  the  possibility  of  substituting  a  customary  wage  for 
the  actual  one. 

2.  Representative  Character  of  Returns.  —  An  important 
consideration  in  the  collection  of  data  is  the  amount  of  ma- 
terial required  to  justify  the  construction  of  tables  on  which 
reliable  conclusions  can  be  based.  This  question  of  represent- 
ativeness of  returns  is  fundamental  to  the  proper  develop- 
ment of  wage  statistics.  As  it  is  impossible  to  secure  from 
every  employee  a  return  of  his  actual  wage,  so  it  is  impossible 
to  secure  a  transcript  of  the  pay  roll  of  every  manufacturing 
establishment  in  the  United  States.  Fortunately,  the  prob- 
lem is  not  so  difficult  of  solution  as  it  may  appear.  In  any 
given  locality  there  is  a  strong  tendency  toward  uniformity 
of  wages  in  the  same  occupation ;  if,  therefore,  the  occupa- 
tions are  carefully  designated,  the  number  of  returns  for  a 
given  occupation  need  not  necessarily  be  inclusive  of  all  em- 
ployees engaged  in  the  same  kind  of  work.  The  more  pre- 
cisely the  occupation  is  described,  with  regard  to  sex,  age, 
and  gradations  of  skill,  the  fewer  are  the  numbers  needed. 
It  is  impossible,  however,  at  the  present  stage  of  the  develop- 
ment of  wage  statistics,  to  \dcy  down  any  definite  formula  as 
to  the  exact  proportions  required.  In  this  investigation  the 
Census  Office  has  endeavored  to  secure  a  harmony  in  the 
proportions  of  returns  for  different  occupations,  and  believes 
that  for  most  of  the  occupations  tabulated  the  numbers  are 
sufficientlj^  large  to  justify  the  uses  to  which  they  are  put. 

3.  Selection  of  Establishments.  —  Effort  was  made,  both 
by  the  Census  Office  at  the  outset  and  by  the  agents  when 


COLLECTING   STATISTICAL   DATA  199 

actually  on  the  ground,  to  select  establishments  which  may 
be  regarded  in  every  respect  as  representative.  It  was  de- 
termined to  secure  returns  from  establishments  having  the 
largest  numbers  of  employees ;  and  to  insure  the  compara- 
bility of  the  statistics  no  establishment  was  chosen  which 
had  been  in  existence  less  than  twelve  years.  Trial  lists  of 
addresses  were  accordingly  prepared  from  the  general  manu- 
facturing schedules  of  1900  on  file  in  the  Census  Office.  In 
the  progress  of  the  work,  however,  various  practical  difficul- 
ties arose  which  made  it  necessary  in  some  instances  to  pro- 
cure pay  rolls  of  small  establishments,  but  in  every  case, 
these  are  well-established  undertakings  and  may  safely  be 
regarded  as  representative.  The  number  of  pay  rolls  utilized 
in  the  compilation  of  the  tables  is  720.  Classified  according 
to  the  number  of  employees,  the  establishments  from  which 
these  pay  rolls  were  secured  are  grouped  as  follows  : 


Number  op  Employees  per  Establishment 


Total 


Less  than  100 
100  to  499  . 
500  to  999  . 
1000  and  over 


Number  ok 
Establishments 


720 


260 
336 

74 
50 


4.  Difficulties  Met  by  Special  Agents.  —  It  is  gratifying  to 
note  that  there  was  a  general  willingness  on  the  part  of  em- 
ployers to  furnish  pay  rolls ;  objection  was  a  rare  exception. 
The  difficulties  met  by  the  special  agents  may  be  summarized 
as  follows  : 

(1)  Destruction  of  the  pay  rolls  for  one  of  the  two  periods  : 
This  was  due  either  to  fire  or  to  the  policy  of  a  company  to 
destroy  the  pay-roll  records  after  a  brief  term  of  years. 


200  STATISTICAL   METHODS 

(2)  Inaccessibility :  Sometimes  the  pay  rolls  were  stored 
away  in  attics  or  cellars,  requiring  time  and  labor  to  make 
them  available.  Where  the  character  of  the  organization 
had  changed,  the  books  of  the  old  concern  were  often  in  the 
hands  of  some  one  no  longer  interested  in  the  operation  of 
the  new  company.  If  the  old  institution  had  become  a  part 
of  an  industrial  combination,  with  head  offices  at  a  distance 
from  the  particular  plant  visited,  the  superintendent  was 
seldom  willing  to  give  the  information  without  authorization 
by  an  official  of  the  controlling  corporation;  frequently  in 
such  a  case  a  \asit  to  the  head  office  was  necessary. 

(3)  Imperfect  records :  Many  of  the  pay  rolls  were  so  im- 
perfect that  they  were  worthless  for  the  inquiry.  In  some 
of  them  lump  sums  were  included  for  contract  work  Avithout 
any  designation  of  the  number  of  employees  working  under 
the  contract ;  in  others  the  earnings  of  helpers  were  consoli- 
dated with  those  of  the  employees  whom  they  helped.  Under 
these  conditions  separate  wages  could  not  be  determined. 
In  establishments  where  piecework  prevailed  it  was  often 
necessary  to  ascertain,  from  small  time  books  kept  by  the 
foremen  of  the  various  departments,  the  time  actually  worked 
by  the  indi\adual  employee  —  a  task  demanding  patience 
and  care.  Only  rarely  did  the  pay  rolls  separately  designate 
children,  even  when  they  were  employed,  and  to  determine 
this  point  special  inquiry  generall}-  was  necessary;  at  best 
the  information  gathered  and  returned  as  to  the  ages  of 
employees  is  unsatisfactory,  and  it  is  probable  that  the 
actual  number  of  employees  under  16  years  of  age  is  larger 
than  that  reported.  It  was  not  an  infrequent  experience 
for  the  agents  to  find  by  subsequent  inquiry  that  some  of 
the  employees  returned  as  16  years  of  age  and  over  be- 
longed to  the  younger  age  class ;  only  in  States  where  local 
legislation  in  regard  to  school  attendance  is  stringently  en- 


COLLECTING  STATISTICAL   DATA  201 

forced  is  the  classification  of  age  of  employees  likely  to  be 
of  much  service. 

5.  Lack  of  Uniformity  in  Pay  Rolls.  —  The  pay  rolls  which 
were  finally  secured  are  not  uniform  or  simple  in  character. 
The  two  principal  sources  of  difficulty  are,  first,  the  variety 
of  time  units  for  wiiich  rates  are  returned ;  and  second,  the 
fact  that  in  many  establishments  no  permanent  record  of 
time  is  kept,  and  for  some  of  the  employees  earnings  only 
are  reported.  Rates  are  reported  by  the  hour,  day,  week, 
half  month,  and  even  by  the  month,  or  year.  Where  earn- 
ings were  returned  the  time  worked  in  some  instances  was 
reported,  making  it  possible  to  determine  the  rate ;  in  other 
cases,  however,  the  time  was  unknown,  and  rate  tabulations 
could  not  be  made.  .  .  . 

6.  Rejections.  —  Whenever  the  wages  returned  for  an 
employee  include  anything  besides  the  actual  compensation 
for  his  own  personal  and  unassisted  services  they  have  been 
rejected,  unless  such  actual  compensation  can  be  definitely 
determined.  For  example,  the  wages  of  a  teamster  furnish- 
ing his  own  horses  are  excluded,  and  so  also  is  the  lump  sum 
reported  as  paid  to  a  workman  with  one  or  more  helpers,  un- 
less the  proportion  received  by  each  is  given. 

Again,  where  it  is  evident  that  the  wages  reported  as 
paid  to  an  employee  were  received  for  work  which  was 
additional  to  and  outside  of  his  regular  duties,  the  return 
for  that  employee  has  been  omitted.  Thus,  in  the  case  of 
a  Sunday  watchman  reported  as  receiving  $2  a  week  and 
working  twelve  hours,  there  can  be  no  doubt  that  this  wage 
of  $2  is  for  work  additional  to  and  outside  of  his  regular 
duties,  and  to  show  a  man  who  earns  $2  for  twelve  hours' 
work  as  recei\ang  only  that  amount  for  a  week  would  be 
palpably  wrong. 

The  wages  of  persons  whose  services  were  chiefly  clerical 


202  STATISTICAL   METHODS 

in  their  nature  are  omitted,  as  are  those  of  all  salesmen  and 
superintendents. 

Where  average  earnings  are  reported,  instead  of  exact  earn- 
ings or  actual  rates,  such  averages  are  excluded. 

7.  Wage  Groups.  —  In  classifying  the  returns  into  groups, 
it  is  desirable  to  choose  a  unit  of  division  small  enough  to 
bring  out  the  essential  facts.  If  the  group  has  two  extensive 
limits,  it  may  include  employees  of  widely  different  grades  of 
skill  and  compensation,  making  it  difficult  to  discover  changes 
occurring  between  the  two  given  periods  of  time.  The  ideal 
method  would  be  to  arrange  a  series  of  gradations  so  minute 
that  every  employee  would  be  assigned  to  his  actual  rate; 
this,  however,  is  impracticable,  both  on  account  of  the  ex- 
pense and  of  the  difficulty,  under  the  present  limitations  of 
statistical  art,  of  grasping  the  significance  of  tables  so  elabo- 
rate in  detail.  Accordingly,  the  unit  adopted  for  the  tables 
of  this  report  is  50  cents  for  week  rates  and  1  cent  for  hour 
rates.  Never  is  a  difference  of  more  than  50  cents  a  week, 
of  1  cent  an  hour,  necessary  to  change  an  employee's  standing 
in  the  wage  scale  from  one  group  to  another,  and  often  a 
much  smaller  difference  will  produce  such  a  change;  thus, 
for  example,  when  the  rate  is  near  the  upper  limit  of  the  wage 
group,  the  amount  of  increase  necessary  to  remove  it  to  the 
next  higher  group  varies  directly  with  the  distance  between 
the  actual  rate  and  the  upper  limit  of  the  group ;  on  the  other 
hand,  the  nearer  such  a  rate  is  to  the  lower  hmit  of  the  wage 
group,  the  smaller  the  decrease  necessary  to  cause  its  removal 
to  the  group  below. 

8.  Time  Units.  —  The  units  of  time  finally  adopted  as 
the  most  serviceable  for  the  tabulation  of  rates  are  the  hour 
and  the  week.  The  day  unit  has  many  advantages,  but 
little  information  is  supplied  by  day  rates  which  is  not  found 
also  in  hour  and  week  rates.     From  the  week  rate  it  is  pos- 


COLLECTING   STATISTICAL   DATA  203 

sible  to  determine  the  maximum  amount  which  a  workman 
can  earn  per  week  in  normal  working  hours,  and  from  the 
hour  rate  it  is  possible  to  discover  increases  in  the  rate  of 
wages  per  unit  of  exertion  which  are  clue  to  the  shortening  of 
the  hours  of  labor  per  week  rather  than  to  an  actual  increase 
in  the  weekly  rate  of  pay.  Sometimes,  also,  the  change  in 
the  weekly  rate  is  clue  to  a  difference  in  a  number  of  hours 
worked  per  week,  the  rate  per  hour  remaining  the  same.  On 
account  of  the  variety  of  the  returns  great  care  has  been  taken 
in  reducing  them  to  a  common  standard  for  purposes  of  pres- 
entation and  comparison. 

It  may  be  remarked  that  there  are  several  causes  which 
may  make  the  change  in  the  wages  of  the  same  persons  appear 
different  in  the  tables  of  rates  per  week  from  those  shown  by 
the  tables  of  rates  per  hour.  Briefly  stated,  these  causes  are 
as  follows  : 

(1)  The  change  of  normal  hours  in  establishments  during 
the  decade. 

(2)  The  coml)ination  of  returns  from  establishments  with 
different  normal  working  hours  for  the  various  occupations, 
in  which  the  proportions  of  the  returns  of  the  several  establish- 
ments change  from  one  period  to  the  other. 

(3)  The  difference  in  scale  between  the  wage  groups  in  the 
week  and  those  in  the  hour  tabulations,  resulting  in  a  slight 
change  in  the  distribution  of  the  returns  through  the  groups. 

9.  Normal  and  Actual  Working  Time.  —  Normal  time 
is  the  number  of  hours  regularly  worked  under  full  time. 
Actual  time  is  the  number  of  hours  which  a  particular  em- 
ployee actually  works  in  earning  the  amount  of  money  paid 
him  for  the  period  in  question.  Care  has  been  taken  to  dis- 
tinguish between  this  normal  working  time  for  a  factory, 
or  a  department  of  a  factory,  and  the  actual  number  of 
hours  worked  by  each  individual  employee  in  that  factory 


204  STATISTICAL   METHODS 

or  department.  In  all  cases  the  rates  published  are  based 
on  the  normal  time.  The  only  use  made  of  the  actual  time, 
when  reported,  is  in  the  computation  of  rates  from  earnings 
or  earnings  from  rates. 

10.  Time  and  Pieceworkers.  —  There  are  two  principal 
methods  of  payment  for  labor  —  payment  for  length  of 
time  worked,  and  paj^ment  for  quantit}^  of  work  done,  or 
piecework.  In  the  preparation  of  statistics  of  wage  rates, 
the  wages  of  time  workers  are  usually  returned  in  practi- 
cally the  form  desired  for  purposes  of  tabulation,  since  the 
basis  of  payment  is  a  certain  amount  of  money  for  a  cer- 
tain length  of  time.  For  pieceworkers,  however,  the  com- 
putation of  rates  is  more  difficult ;  their  wages  are  always 
reported  in  the  form  of  the  amount  paid  on  the  given  pay 
day.  Unless  the  exact  time  worked  in  earning  this  pay  is 
reported,  no  computation  of  the  wage  rate  is  possible ; 
but  when  the  working  time  required  to  earn  the  pay  reported 
is  stated,  the  computation  of  a  time  rate  is  considered  justi- 
fiable. For  wliile  piecework  may  be  described  as  a  system 
under  which  an  employee  sells  to  his  employer  a  specified 
quantity  of  labor,  irrespective  of  the  time  occupied  in  the 
performance  of  that  labor,  and  time  work  as  a  system  under 
which  he  sells  to  his  employer  the  labor  which  he  shall  per- 
form within  a  given  period,  irrespective  of  what  the  quan- 
tity of  that  labor  may  be,  yet  in  each  case  both  the  time 
worked  and  the  quantity  of  work  done  are  taken  into  con- 
sideration in  fixing  the  rate  of  pay.  A  piece  rate  always 
implies  a  time  basis,  being  adjusted  with  reference  to  the 
time  required  by  the  average  workman  for  the  performance 
of  a  given  piece  of  work ;  conversely,  a  time  rate  always 
implies  a  piece  basis,  for  the  workman  under  this  system 
must  usually  perform  a  certain  minimum  of  work  or  lose 
his  place.     Thus  the   two   systems   of  payment,   although 


COLLECTING   STATISTICAL   DATA  205 

apparently  diverse,  are  so  closely  related  as  to  warrant 
the  computation  of  time  rates  for  pieceworkers  when  the 
exact  working  time  of  the  pieceworker  is  reported ;  es- 
pecially is  this  true  for  purposes  of  comparison. 

11.  Necessity  for  Computation  of  Rates.  —  Each  line  of  a 
pay-roll  schedule  shows  the  rate  per  hour,  day,  week,  month, 
or  year,  in  some  cases  per  two  weeks,  and  in  one  or  two 
instances  per  quarter  hour,  for  one  or  more  employees  doing 
the  same  work  and  receiving  the  same  wage.  As  the  pur- 
pose is  to  present  tables  showing  rates  per  hour  and  per 
week  (or  when  this  is  impossible,  earnings  per  week),  it  is 
necessary,  when  one  is  given,  to  compute  the  other,  and  when 
neither  the  week  nor  hour  rate  is  given  to  compute  both 
from  the  data  that  are  given.  A  considerable  number  of 
jjay  rolls  show  earnings  for  the  period  covered  by  them  — 
i.e.  a  week,  two  weeks,  or  a  month,  as  the  case  may  be. 
This  is,  of  course,  the  rule  when  returns  are  made  for  piece- 
workers. In  such  cases  the  rates  per  hour  and  week  can  be 
derived  by  computation  only  when  the  exact  number  of 
hours  worked  is  stated  or  the  actual  number  of  days  of 
known  length  is  given.  The  time  worked  to  earn  the 
amount  given  is  never  estimated,  no  attempt  being  made  to 
derive  rates  from  earnings  unless  the  number  of  houi's 
worked  to  earn  the  amount  stated  is  definitely  known  for  the 
individual  employee. 

12.  Rules  for  Computation  of  Rates.  —  The  following  are 
th6  general  rules  according  to  which  the  computation  of 
rates  is  made : 

(1)  When  the  rate  given  is  per  hour,  the  week  rate  is  ob- 
tained by  multiplying  the  hour  rate  by  the  number  of 
hours  regularly  worked  in  a  week  by  the  employee. 

(2)  When  the  rate  given  is  per  day,  the  hour  rate  is  obtained 
by  dividing  the  day  rate  by  the  nimiber  of  hours  regularly 


206  STATISTICAL   METHODS 

worked  in  a  day,  and  the  week  rate  is  then  obtained  as  in 
(1).     (For  exception  see  section  14,  below.) 

(3)  When  the  rate  given  is  per  week,  the  hour  rate  is  ob- 
tained by  dividing  the  week  rate  by  the  number  of  hours 
regularly  worked  in  a  week. 

(4)  When  the  rate  given  is  bi-weekly,  a  weekly  rate  is 
obtained  by  dividing  the  bi-weekly  rate  by  2,  and  the  re- 
sulting rate  per  week  is  then  treated  as  in  (3). 

(5)  When  the  rate  given  is  per  month,  unless  for  an  em- 
ployee regularly  worldng  every  day,  including  Sunday,  a 
day  rate  is  obtained  by  dividing  the  monthly  rate  by  26, 
and  the  day  rate  thus  obtained  is  treated  as  in  (2) .  In  cases 
where  a  monthly  rate  is  given  for  an  employee  regularly 
working  every  day  in  the  week,  including  Sunday,  the  rate 
per  day  is  the  result  of  dividing  the  rate  per  month  by  30 
instead  of  by  26. 

(6)  When  the  rate  given  is  per  year,  it  is  first  reduced 
to  a  monthly  rate  by  dividing  by  12,  and  the  monthly  rate 
thus  obtained  is  treated  as  in  (5). 

13.  Exception  for  Iron  and  Steel  Industry.  —  The  preva- 
lence of  turn  or  tour  duty  in  the  iron  and  steel  industry  makes 
necessary  some  slight  exceptions  to  the  general  rules  adopted 
for  the  computation  of  wages  in  other  industries.  In  this 
industry  a  turn,  tour,  trick,  or  shift  is  12  hours  long  in 
many  establishments,  one  crew  working  from  noon  till  mid- 
night and  the  other  from  midnight  till  noon.  The  night 
crew  in  a  number  of  plants  works  only  5  days  a  week,  and 
as  those  who  work  at  night  one  week  work  during  the  day 
the  following  week,  an  employee  puts  in  only  ]  1  days  in 
two  weeks.  This  constant  and  regular  variation  in  the 
normal  working  hours  per  week  for  many  establishments 
makes  it  advisable  to  compute  rates  for  the  operative  in 
this  industry  on  the  basis  of  2  weeks  instead  of  1,  and  this 


COLLECTING   STATISTICAL   DATA  207 

has  been  done.  For  such  employees  as  work  in  turns,  6 
days  in  one  week  and  5  the  next,  a  day  rate  is  obtained  and 
multipUed  by  11,  while  for  those  who  work  6  days  in  each 
week,  the  day  rate  is  multiplied  by  12.  Otherwise  the  rates 
are  computed  according  to  the  general  rules  already  given. 
14.  Exception  for  Half  Holiday  without  Loss  of  Pay.  — 
Pay  rolls  were  submitted  by  some  establishments  which 
paid  their  employees  for  6  full  days  although  the  plants 
closed  early  on  Saturday  —  at  noon  in  some  cases.  The 
rates  for  this  class  of  establishments  are  somewhat  differ- 
ently computed ;  if  an  hour  or  day  rate  is  returned,  the 
week  rate  is  obtained  by  multiplying  the  rate  given  by  the 
number  of  hours  or  days,  as  the  case  may  be,  in  a  week  of 
6  normal  days.  The  week  rate  so  obtained  is  then,  for  a 
new  hour  rate,  divided  l;).y  the  number  of  hours  normally 
worked.  For  example,  a  machinist  may  be  paid  30  cents 
an  hour  for  10  hours  a  day,  60  hours  a  week,  although  the 
plant  where  he  is  employed  closes  regularly  at  noon  on 
Saturdays.  The  number  of  hours  actually  worked  b}^  this 
machinist  each  week  will  be,  then,  not  60,  but  55.  Since 
he  is  paid  for  a  full  week,  he  really  receives  $18  for  55  hours' 
work,  32.7  cents  an  hour,  although  if  he  worked  anything 
less  than  full  time  he  would  receive  compensation  at  the 
rate  of  30  cents  an  hour.  He  stands  in  the  same  position, 
as  far  as  earnings  are  concerned,  as  the  machinist  who  is 
paid  30  cents  an  hour,  but  who  must  w^ork  60  hours  a  week ; 
both  receive  $18  a  week,  but  the  first  gets,  in  addition  to  his 
money  wages,  a  certain  amount  of  time  which  is  his  own. 
This  advantage  is  usually,  if  not  always,  made  contingent 
on  the  operative  working  full  time,  but  as  rates  are  always 
computed  on  the  basis  of  full  normal  time,  that  fact  is  not 
here  material.  Other  things  being  equal,  the  first,  work- 
ing 55  hours  a  week,  enjoys  an  advantage  over  the  employee 


208  STATISTICAL   METHODS 

working  60  hours,  and  to  show  this  advantage  the  above 
exception  to  the  ordinary  rules  of  computation  is  made. 

15.  Cornputation  of  Earnings.  —  The  pay  rolls  showing 
earnings  without  giving  the  actual  time  worked  by  the 
wage  earner,  although  of  secondary  importance,  are  deemed 
too  valuable  to  be  disregarded,  and  the  returns  of  earnings 
have  therefore  been  presented  in  separate  earnings  tables. 
The  only  period  for  which  actual  earnings  can  be  accu- 
rately ascertained  is  that  for  which  they  are  reported,  namely, 
the  period  covered  by  a  single  wage  payment.  In  most 
cases  this  is  a  week,  but,  as  in  the  case  of  rates,  there  is  some 
diversity,  the  period  being  sometimes  a  half-month  or  a 
month. 

For  the  purposes  of  this  inquiry  the  week  is  a  more  satisfac- 
tory period  than  the  month,  as  well  as  a  more  available  one. 
In  any  large  factory  there  will  be  a  considerable  number 
of  men  who  will  be  found  to  have  worked  full  time,  whether 
the  period  be  a  week  or  a  month ;  but  of  those  who  may  be 
considered  regular  employees,  more  will  have  been  absent 
some  time  in  a  month  than  in  a  week,  and  there  will  also 
be  more  old  hands  discharged  or  new  ones  taken  on,  or 
both.  Moreover,  in  a  month  the  number  of  short-time 
men  will  be  greater  than  in  a  week,  and  consequently  the 
total  number  of  employees  reported  will  be  larger.  The 
aggregate  amount  of  lost  time  will  probably  be  about  the 
same  in  one  week  as  in  another,  apart  from  any  general 
shutdown  in  the  entire  factory,  and  the  period  including 
such  a  shutdown  would  not  be  selected  by  the  special  agent. 
Consequently  it  is  believed  that  the  computation  of  earn- 
ings for  a  week  from  the  reports  for  a  longer  period  is  justified. 

For  these  reasons  the  week  has  been  adopted  as  the 
basis  for  the  tabulation  of  earnings,  and  where  the  earnings 
reported  are  for  a  longer  period  they  are  reduced  to  the 


COLLECTING    STATISTICAL    DATA  209 

week  basis.  To  tlie  objection  that  such  a  reduction  should 
not  be  made,  it  is  answered  that  the  reduction  made  in  the 
present  investigation  is  justified  by  two  facts :  First,  the 
number  of  returns  to  which  this  objection  would  apply 
is  very  small ;  and  second,  the  special  agents  in  taking 
these  long-time  pay  rolls  usually  omitted  the  employees 
who  worked  only  a  small  part  of  the  pay  period.  These 
considerations  have  no  effect  on  the  computation  of  rates, 
but  if  the  reduction  of  earnings  for  a  month  to  earnings  for 
a  week  were  more  frequent  it  would  alTect  unfavorably  the 
value  which  the  earnings  statistics  might  have.  The  rules 
according  to  which  the  earnings  computations  are  made 
are  as  follows : 

(1)  When  earnings  are  stated  for  a  two-week  period, 
those  for  one  week  are  obtained  by  dividing  by  2. 

(2)  When  earnings  are  stated  for  a  month,  they  are  divided 
by  26,  the  number  of  working  days  in  a  month,  and  the 
resulting  quotient  is  multiplied  by  6.  In  cases  where  the 
wage  earners  work  regularly  7  days  a  week  the  divisor 
used  is  30  instead  of  26,  and  the  resulting  quotient  is  multi- 
plied by  7  instead  of  by  6.  j 

(3)  When  rates  are  returned  with  the  exact  time  worked, 
in  addition  to  the  time  normally  worked,  then,  after  the 
card  is  computed  for  rates,  the  earnings  are  obtained  by 
multiplying  the  rate  per  hour  by  the  exact  number  of  hours 
worked  in  the  period  covered  by  the  pay  roll,  and  if  for  a 
period  other  than  a  week  they  are  reduced  to  a  weekly  basis. 

16.  Computation  of  Percentages.  —  In  working  percent- 
ages computations  are  carried  to  two  places  of  decimals, 
and  the  second  allowed  to  influence  the  first,  which  is  the 
last  figure  shown.  In  the  case  of  cumulative  percentages 
the  accumulation  is  first  made  and  the  resulting  percentage 
shown  to  one  place  of  decimals.     - 


210  STATISTICAL   METHODS 

REVIEW 

1.  What  distinctions  are  made  between  the  names  which  are 
used  to  describe  the  compensation  which  employees  receive?  Do 
these  agree  with  those  formulated  in  the  Text,  Chapter  IV?  What 
difficulties  are  mentioned  in  securing  records  of  compensation? 
Do  these  seem  real  to  you  ?     Why  ? 

2.  What  bases  are  used  in  grouping  the  industries  for  tabulation  ? 
Do  these  seem  logical  to  you?  Suggest  others.  What  conditions 
seem  to  have  determined  the  grouping  used? 

3.  Under  the  headings  "step  in  wage  investigation,"  "collection 
of  data,"  what  topics  are  discussed? 

4.  What  difficulties  were  encountered  in  the  use  of  pay  rolls  ? 

5.  What  problems  are  suggested  in  the  contention  that  the  re- 
turns must  be  representative? 

6.  What  things  were  considered  and  why  in  fixing  the  wage 
groups  for  tabulation?  In  fixing  the  time  units  for  expression 
of  wage  data?  Do  the  considerations  noted  here  seem  to  you  to 
be  of  general  application,  or  are  they  limited  to  this  particular 
statistical  problem? 

7.  Why  were  the  rates  published  based  on  "normal  time"? 

8.  How  were  the  piece  rates  reduced  to  a  time  basis?  Is  such 
reduction  always  possible? 

9.  What  rules  were  followed  in  computing  "rale  of  compensa- 
tion ' '  ?     Why  was  a  week  chosen  as  the  rate  period  ? 

Statistics  of  the  United  States  Shipping 

Board  ^ 

I.   Introduction 

What  is  said  about  the  statistics  of  the  United  States 
Shipping  Board  has  to  do  primarily,  but  not  solely,  with  the 
Division  of  Planning  and  Statistics. 

The  Division  of  Planning  and  Statistics  of  the  Shipping 
Board,  at  the  time  of  its  organization,  was  unique  among 

1  Adapted  from  Secrist,  Horace,  "Statistics  of  the  United  States  Shipping 
Board,"  Quarterly  Publications  of  the  American  Statistical  Association,  March, 
1919,  pp.  236-247. 


COLLECTING   STATISTICAL   DATA  211 

government  bureaus.  It  was  created  in  response  to  an 
urgent  need  for  the  development  of  a  plan  and  method  in 
the  utilization  of  American  and  American  controlled  foi-eign 
tonnage  in  the  prosecution  of  the  war.  Foresight  and  plan- 
ning were  to  be  and  have  been  the  guiding  principles  in  its 
development.  The  making  of  history,  the  production  of 
finished  and  comparable  statistical  reports  have  constantly 
been  sacrificed  to  the  need  for  day-to-day  statistics  of  use  for 
planning  purposes.  Hence,  statistical  hazards  —  jumps 
in  the  dark,  as  it  were  —  were  taken  when  there  was  only 
the  smallest  chance  of  their  being  justified  when  viewed  from 
any  other  angle  than  the  emergency  which  prompted  them. 
As  fast  as  conditions  were  standardized  the  statistics  were 
improved ;  they  became  more  comprehensive,  and  more 
closely  followed  the  canons  imposed  by  approved  methods. 

II.    The  Problems  to  Be  Met  by  the  Division 

At  the  beginning  of  1918,  the  United  States  had  a  small 
merchant  fleet  of  its  own,  a  nascent  emergency  fleet,  some 
enemy  seized  and  requisitioned  neutral  vessels.  Both  in- 
number  and  in  maimer  of  use,  they  were  inadequate  to 
guarantee  a  "bridge  of  ships"  either  for  war  or  trade  pur- 
poses. Moreover,  to  leave  them  in  their  accustomed  trades 
would  only  aggravate  the  shortage.  Control  of  imports 
first,  and  of  exports  later  was  imperative.  Moreover, 
Government  control  of  vessels  was  necessary.  This  was 
provided  by  requisition  orders  and  covered  not  only  ves- 
sels building  in  the  United  States  on  American  and  foreign 
account,  but  also  vessels  trading  between  the  United  States 
and  foreign  countries.  Control  both  of  vessels  and  of  com- 
modities seemed  to  guarantee  against  a  wasteful  use 
of  United  States  and  United  States  controlled  foreign  ship- 


212  STATISTICAL   METHODS 

ping.  Administrative  action  and  intelligent  planning,  how- 
ever, were  necessary  in  order  that  economical  use  might  be 
realized.  How  was  this  secured  so  far  as  the  Division  of 
Planning  and  Statistics  of  the  Shipping  Board  is  concerned? 

Study  by  the  Commodities  section  of  the  Division  un- 
mistakably revealed  the  importation  into  the  United  States 
of  ''unnecessary"  goods.  Such  use  of  ship  tonnage  could 
not  be  defended  in  any  scheme  which  made  "win  the  war 
by  intelligently  using  ships"  its  chief  sanction.  This  fact 
was  patent,  but  to  measure  the  amounts  in  long  tons  of  such 
unnecessary  imports,  often  quoted  in  trade  statistics  in 
values  or  in  containers,  and  the  equivalent  ship  tonnage 
"wasted"  through  such  importation  presented  real  sta- 
tistical problems.  These  were  the  first,  and  continued  to  be 
some  of  the  most  difficult,  statistical  problems  of  the  Division. 

By  statistical  analysis,  consultation  with  the  trade,  with 
the  Army,  the  Navy,  the  Food  Administration,  the  State 
Department,  and  other  Government  agencies,  an  import 
program  was  finally  established.  In  outline,  this  provided 
that  the  War  Trade  Board  should  license  imports  and  that 
the  Shipping  Board  should  provide  the  necessary  ship 
tonnage  to  move  them.  In  working  out  this  program, 
trade  protests  and  diplomatic  objections  had  to  be  met  or 
circumvented.  The  argument,  that  to  cut  imports  saved 
ship  tonnage,  was  true,  but  its  application  was  neither  seen 
nor  welcomed  at  first  by  the  interests  involved.  The  ques- 
tion was  asked  —  and  later,  answered  —  How  much  ton- 
nage? It  was  necessary  to  determine  the  amount  of  sav- 
ings not  only  to  meet  trade  objection,  but  likewise  to  furnish 
a  basis  for  the  assignment  of  tonnage  by  the  Shipping 
Board  so  as  to  guarantee  that  the  import  program  in  its 
civiUan  and  army  aspect  would  be  met.  To  answer  the 
shipping  side  of  this  question  required  that  the  following, 


COLLECTING   STATISTICAL   DATA  213 

among  other  problems,  be  studied,  and  that  the  results 
of  such  study  become  controlling  factors  in  the  daily  ad- 
ministrative routine  of  the  Division,  and  of  the  other  war 
boards  with  which  it  cooperated. 

(1)  The  stowage  of  goods  in  space  and  weight. 

(2)  The  conversion  into  long  tons  of  the  values  and  other 
units  in  which  imports  are  often  expressed. 

(3)  The  turn-arounds  of  vessels,  or  the  time  spent  in 
completing  one  round  trip. 

(4)  The  unit  in  which  to  measure  cargo  capacity,  in  the 
study  of  vessel  utilization. 

(5)  The  relations  between  the  ship  tonnages  in  cur- 
rent use. 

(6)  The  relations  between  the  different  types  of  vessels 
as  carrying  units. 

(7)  The  relations  of  bunkers  and  stores  to  total  ship 
tonnage  in  order  to  determine  the  capacity  for  cargo  ton- 
nage. 

(8)  Use  and  practicabihty  of  combination  as  contrasted 
with  solid  cargoes,  and  the  relation  of  the  distribution  of 
necessary  imports  thereto. 

(9)  Suitability  of  vessels  for  various  services,  account 
being  taken,  among  other  things,  of  size,  speed,  perma- 
nent bunkers,  fuel  consumption,  charter  restrictions,  etc. 

(10)  Ballast  movements,  and  underloading  by  space  and 
weight. 

(11)  Distribution  of  ship  tonnage  by  trades  and  services. 

(12)  Vessel  control  by  flag,  charter,  agreement,  etc. 

(13)  Losses  through  marine  risk  and  enemy  action. 

(14)  Acquisitions   to   merchant    fleets   through   building, 
purchase,  charter,  repair,  and  salvage. 

This  list,  though  far  from  complete,  will  serve  to  illus- 
trate the  types  of  problems  with  which  it  was  necessary 


214  STATISTICAL  METHODS 

to  deal.  The  statistical  material  for  their  measurement 
had,  for  the  most  part,  to  be  created,  or  secured  in  a  crude 
state  from  widely  different  and  frequently  conflicting  sources. 
A  review  of  this  in  terms  of  the  problems  named,  may  be 
interesting.  It  is  impossible,  however,  in  this  short  paper, 
to  develop  fully  any  of  these  topics,  and  to  criticize  from  a 
statistical  point  of  view  the  sources  of  material  and  the 
uses  to  which  they  are  put.  Little  more  can  be  done  than 
to  list  them. 

III.    Sources  of  Material 

(1)  "Cargo  Reports." 

These  are  reports  made  by  masters  of  vessels  to  col- 
lectors of  customs  on  vessels,  (a)  entering  foreign,  (6)  clear- 
ing foreign,  (c)  arriving  coastwise.  They  are  made  in 
duplicate,  one  copy  going  to  the  Shipping  Board,  and  one 
being  filed  by  the  collector.  They  show  (using  the  enter- 
ing form  as  an  example),  for  individual  vessels,  the  port  of 
entry,  name,  type,  flag,  port  of  origin,  date  of  clearance, 
ports  of  call,  with  arrival  and  clearance  dates ;  gross,  net, 
and  total  deadweight-tonnage ;  tons  of  bunkers,  water,  and 
stores  on  leaving ;  days  spent  in  port  of  origin ;  deadweight 
for  cargo ;  total  cargo  on  board  in  long  tons  and  cubic  feet ; 
total  capacity  in  cubic  feet  (bale  and  grain) ;  description 
of  cargo,  showing  for  each  of  about  sixty  principal  com- 
modities, port  of  loading,  long  tons  on  board,  and  cubic 
feet  of  space  employed ;  amount  to  be  discharged  at  port 
of  entry  in  long  tons ;  etc. 

These  reports  are  fundamental,  and  supply  source  ma- 
terial on  stowage,  tonnages,  source  of  imports,  solid  and 
combination  cargoes,  turn-arounds,  delays  in  port ;  bal- 
last movements  and  vessel  utilization,  relation  of  total  to 
cargo  deadweight,  etc. 


COLLECTING   STATISTICAL   DATA  215 

(2)  Application  for  License  for  Bunker  Fuel,  Port,  Sea,  and 
Ship's  Stores  and  Supplies.  —  "Bunker  Form  B-1." 

This  is  a  report  made  out  in  triplicate  by  the  owner,  char- 
terer, agent,  or  master  of  a  vessel,  and  is  presented  to  the 
agent  of  the  Bureau  of  Transportation  of  the  War  Trade 
Board  or  to  the  collector  of  customs.  One  copy  goes  to  the 
Shipping  Board.  Among  other  things,  this  report  calls  for 
the  name,  flag,  type,  speed ;  registered  gross,  net,  and  total 
deadweight-tonnage  of  vessels ;  average  daily  consumption 
of  fuel  in  port ;  owner's  and  charterer's  name,  address, 
and  nationality ;  date  of  charter  party ;  date  of  expiration 
of  charter  party ;  trading  limits,  if  on  time  charter ;  ports 
of  call  on  last  completed  voyage ;  last  port  outside  United 
States  from  which  vessel  cleared ;  description  of  complete 
voyage  which  is  to  be  made ;  etc.  This  report  when  sent 
to  the  Shipping  Board  also  contains  a  statement  of  the 
amount  of  fuel  and  stores  actually  licensed  to  be  put  on 
board. 

This  report,  likewise,  is  fundamental  in  the  work  of  the 
Division,  throwing  light,  not  only  on  the  characteristics 
of  vessels,  but  also  on  their  control,  trading  limits,  and  most 
distinctive  of  all,  on  the  relation  of  coal  consumption  to  the 
voyage  in  question.  By  means  of  it,  the  steaming  radius 
of  a  vessel  and  the  relation  of  total  to  cargo  deadweight 
are  checked  against  other  sources,  or  independently  de- 
termined. 

(3)  Master's  Report  on  Outward  Voyage.     "Bunker  Form 
B-3." 

This  is  a  report  made  out  by  the  master  of  the  vessel  at 
the  time  of  completing  his  voyage  and  provides  for  his 
listing  all  ports  of  call  with  dates  of  arrival  and  departure; 
cargo  and  bunkers  loaded  and  discharged,  and  the  amount 
of  fuel  on  board  at  place  of  destination.     The  report,  al- 


216  STATISTICAL   METHODS 

though  not  received  until  the  voyage  is  completed,  is  im- 
portant in  tracing  vessel  itineraries  and  periods  of  turn- 
around. 

(4)  ''Charter  Reports." 

The  Chartering  Committee  of  the  Shipping  Board  ap- 
proves charter  parties  of  American  and  foreign  vessels  do- 
ing foreign  business  with  the  United  States  and  of  foreign 
vessels  doing  coast^\dse  business.  A  daily  report  on  charters, 
approved,  disapproved,  and  cancelled,  is  made  to  the  Divi- 
sion, and  gives,  in  addition  to  descriptive  facts  of  vessels, 
the  names  of  OAvner,  chartered  o^vner,  operative  charterer; 
form  and  duration  of  charter ;  trading  limits,  if  on  time 
charter,  etc.  By  means  of  this  and  other  reports,  record 
is  kept  and  studies  made  of  American  vessels  chartered  to 
foreigners ;  foreign  vessels  chartered  to  the  United  States 
Shipping  Board  or  United  States  citizens ;  foreign  vessels 
chartered  to  foreigners  under  conditions  approved  by  the 
Chartering  Committee ;  and  foreign  vessels  trading  with 
the  United  States  which  are  specifically  required  to  return 
to  the  United  States. 

(5)  "Allocation  Sheets"  —  Ship  Control  Committee. 

Reports  from  the  Ship  Control  Committee  for  the  Ship- 
ping Board  at  New  York  are  received  daily.  These  show 
the  allocations  daily  made,  the  operative  companies,  and 
for  trans-Atlantic  regions,  vessels  en  route  each  way ;  those 
in  home  ports ;  those  in  foreign  ports,  and  the  account 
upon  which  each  is  moving.  Somewhat  similar,  but  far 
less  satisfactory,  reports  are  received  from  the  committee 
on  vessels  trading  with  South  America,  the  West  Indies, 
and  Caribbean  points,  and  in  the  Pacific. 

(6)  Reports  from  the  Division  of  Operations,  United  States 
Shipping  Board. 


COLLECTING   STATISTICAL   DATA  217 

The  Division  of  Planning  and  Statistics  relies  on  the 
Division  of  Operations  for  a  large  amount  of  data  on  the 
measurements  of  vessels,  ownership,  assignment  for  op- 
eration, and  charter  relations  to  the  board.  These  and 
other  data  are  made  available  through  printed  or  mimeo- 
graphed reports,  or  through  daily  digests  of  the  corre- 
spondence of  the  Division. 

(7)  Reports  from  the  United  States  Shipping  Board  Emer- 
gency Fleet  Corporation. 

Likewise,  the  Division  receives  from  the  Emergency 
Fleet  Corporation,  among  other  things,  daily  reports  on 
keels  laid,  launchings,  and  dehveries,  contract  measure- 
ments of  vessels.  Actual  measurements  are  later  sub- 
stituted after  vessel  trials  are  made,  and  the  itineraries, 
loading  factors,  and  general  utilization  of  Emergency  Fleet 
vessels  watched  in  the  same  way  as  they  are  for  others. 

(8)  Reports  from  the  Bureau  of  Navigation,  Department 
of  Commerce. 

The  monthly  and  yearly  reports  on  American  vessels 
documented,  registered,  given  signal  letters,  and  other- 
wise listed  by  the  Bureau  of  Navigation  are  exceedingly 
helpful  in  developing  records  of  our  own  merchant  marine. 
Moreover,  the  bureau's  reports  on  shipbuilding  and  losses 
are  helpful  in  distinguishing  private  from  public  buikling, 
and  for  purging  the  Division's  files  of  vessels  lost  through 
marine  risk  and  enemy  action. 

(9)  Telegraphic  Records  of  Vessel  Movements. 

A.    Cablegrams  from  American  Consuls. 

Daily  cablegrams  are  received  by  the  Division  from 
certain  foreign  ports,  and  weekly  cablegrams  from  others, 
giving  name,  flag,  and  principal  cargo  of  vessels  arriving 
from  or  departing  for  the  United  States.     This  informa- 


218  STATISTICAL   METHODS 

tion  is  significant  for  purpose  of  vessel  loading  and  alloca- 
tion, for  determining  the  degree  to  which  the  import  pro- 
gram is  currently  being  met,  and  for  providing  cargoes 
at  home  and  at  foreign  ports. 

B.  The  Naval  Communication  Service. 

The  Navy  Department,  through  the  Naval  Communi- 
cation Service,  secures  daily  by  telegraph  or  cable,  infor- 
mation on  arrivals  at  and  departures  from  American  and 
from  a  number  of  foreign  ports.  This  information  is  dis- 
tributed in  printed  form  daily,  and  constitutes,  for  opera- 
tive purposes,  probably  the  most  significant  single  source 
of  information  on  vessel  movement  available  to  the  Di- 
vision. The  facts  given  include  name  of  vessel,  flag,  net 
tonnage,  dates,  and  places  of  departure  or  arrival.  Oc- 
casionally facts  on  cargo  are  also  included,  but  these  are 
far  too  meager  and  uncertain  to  serve  as  satisfactory  data 
on  this  topic. 

C.  Other  Cablegrams. 

Cablegrams  to  and  from  the  State  Department,  War 
Trade  Board,  Shipping  Board,  Division  of  Planning  and 
Statistics,  Division  of  Operations,  and  the  Ship  Control 
Committee  serve  currently  to  correct  the  files  of  the  Di- 
vision on  the  operative  status,  charter  and  ownership  con- 
trol of  vessels,  to  indicate  the  types  of  problems  that  are 
to  be  solvedj  and  to  suggest  statistical  summaries  and  re- 
ports which  are  helpful  to  that  end. 

These  sources  of  information  and  the  problems  upon 
which  they  bear  have  to  do  primarily  with  the  domestic 
side  of  the  shipping  problem,  in  so  far  as  it  is  handled  by 
the  Division  of  Planning  and  Statistics.  There  is,  how- 
ever, the  international  side  which  should  receive  attention, 
both  as  to  source  of  material  and  the  problems  involved. 


COLLECTING   STATISTICAL   DATA  219 

IV.    The  Division  and  the  American  Section  of  the 
Allied  Maritime  Transport  Council 

As  events  have  turned  out,  the  Division  of  Planning  and 
Statistics  is  the  primary   agent  through  which   American 
shipping  facts  are  furnished  to  the  American  Section  of  the 
Alhed  Maritime  Transport  Council,  and  to  the  alhed  nations 
generally  as  represented  in  the  Secretariat  of  the  Council 
itself.     Early  in  the  summer  of  this  year,  it  became  evident 
that  the  American  Section  of  the  Allied  Council  was  not 
currently   receiving  from   the   United   States   the   material 
that  it  needed  to  present  fully  the  shipping  situation  of  the 
United  States  at  the  meetings  of   the   Council  in  London. 
]\Ir.  Rublee,  one  of  our  representatives  in  London,  came  to 
the  United  States  in  June  of   this  year  to  present  the  case 
of  the  American  Section  and  it  was  not  until  the  time  of 
his  visit  that  the  obligation  of   the  Shipping  Board  to  the 
American    Section    was    fully    realized.     Domestic    affairs, 
the  newness  of  the  work  of  the  Division,  the  paucity  of 
records,  and  the  insistence  of  those  at  home  for  informa- 
tion all  served  to  keep  the  outlook  of  the  Division  domestic. 
Following  Mr.  Rublee's  visit,  however,  Mr.  E.  F.  Gay,  di- 
rector of  the  Division,  sent  the  writer  to  London  to  study 
the  needs  of  the  American  Section  as  they  were  related  to 
tonnage  matters,  to  provide  machinery  for  meeting  them, 
to  determine  the  wa^-s  in  which  the  American  and  other 
sections  of  the  Council   could  serve   the  Shipping  Board, 
and  to  establish  the  necessary  connections  and  the  required 
machinery  for  securing  these  services.     Later  Mr.   W.   S. 
Tower  of  the  Commodity  Section  of  the  Division  was  like- 
wise sent  to  London  to  study  the  import  and  export  phases 
of  the  problem. 

As   a  result  of   these  visits,   and  of   the   more   thorough 


■220  STATISTICAL   METHODS 

knowledge  of  the  problems  of  the  American  Section  and 
of  the  Shipping  Board,  a  large  part  of  the  acti\'ities  of  the 
Division  has  been  devoted  to  a  consideration  of  the  shipping 
problem  in  its  international  aspect.  Information  on  the 
composition  of  the  merchant  marines  of  the  allied,  enemy, 
and  neutral  countries ;  on  movements  and  cargoes  of  allied 
vessels ;  on  losses  through  marine  risks  and  enemy  action ; 
on  shipbuilding,  repairs,  salvages,  charter  rates,  and  amounts 
of  chartered  tonnage,  etc.,  is  furnished  this  Division  by  or 
through  the  American  Section.  Information  is  also  supplied 
by  the  British  Ministry  of  Shipping,  the  British  Admiralty, 
and  Lloyds.  Some  of  it  comes  by  cable,  and  some  by 
embassy  pouch,  but  it  is  all  illuminating  to  the  shipping 
problems  of  the  world  and  vital  in  the  determination  of 
our  part  in  them. 

In  supplying  information  on  shipping  problems,  the 
Division  of  Planning  and  Statistics  fully  reciprocates.  It 
sends  the  American  Section,  and  through  it  the  allied  coun- 
tries generall}',  either  bj^  cable  or  pouch,  current  data  on 
American  shipbuilding,  American  losses  through  enemy 
action  and  marine  risk,  repairs  to  American  and  foreign 
vessels,  employment  of  American  vessels  and  foreign  ves- 
sels controlled  by  us,  inventory  facts  on  American  Mer- 
chant Marine ;  required  imports  in  long  tons,  and  the 
ship  tonnage  necessary  to  move  them,  together  with  state- 
ments in  detail  of  the  types,  flags,  charter  relations,  and  per- 
formances of  the  vessels  involved.  These  reports  are  by 
individual  vessel,  as  well  as  by  aggregates,  and  follow  the 
forms  drafted  by  the  representatives  of  the  AUied  Coun- 
cil as  bases  for  emplo^inent,  and  loss  and  gain  statements. 

So  long  as  the  shipping  problems  of  the  Allies  are  ad- 
justed by  an  international  council,  the  Division  can  ex- 
pect to  receive  from  and  to  furnish  to  the  American  and 


COLLECTING  STATISTICAL  DATA  221 

other  sections  of  the  Council  current  information  on  mer- 
chant shipping.  The  open,  frank,  give-and-take  philosophy 
which  has  characterized  the  relations  of  the  Council  and 
the  Shipping  Board  is  illustrative  of  the  unity  of  purpose 
with  which  nations  will  associate  themselves  for  a  common 
end.  As  a  result  of  the  cooperation,  the  American  Section 
in  London  is  compiling  master  files  of  American  vessels 
,(it  has  full  access  to  the  files  of  the  British  ^Ministrj'  of 
Shipping,  Intelligence  Branch)  and  the  Di\dsion  of  Planning 
and  Statistics  has  built  up  both  master  and  movement  files 
on  practically  the  entire  sea-going  merchant  tonnage  of 
the  world.  It  not  only  has  developed  the  machinerj'  for 
efficiently  prosecuting  the  war,  but  also  has  collected  facts 
which,  if  continued,  will  be  of  value  in  promoting  trade. 

V.    The  Division  and  Other  Government  Departments 

The  cooperation  of  the  Division  with  other  Government 
Boards  should  be  briefly  mentioned.  Probably  the  De- 
partments with  which  it  most  full}'  cooperates  are  the  War 
Department  and  the  Ship  Control  Committee  of  the  Board. 
It  furnishes  both  organizations  periodic  employment  state- 
ments covering  American  and  foreign  controlled  vessels, 
and  special  studies  of  vessels  suitable  for  use  in  Army  serv- 
ice, when  judged  by  standards  of  physical  capacity,  charter 
limitations,  etc.  A  semi-monthl}^  ship  balance  sheet  of 
tonnage  employed  and  required  serves  to  show  not  only 
tonnage  distribution,  but  also  the  nature  of  the  excesses 
and  deficiencies  in  tonnage,  in  trade,  and  in  Army  uses. 
From  this  statement,  the  Army  knows  currently  the  amount 
and  character  of  tonnage  in  trade  and  is  in  a  position  to 
present  its  case  for  transfer  of  vessels  to  war  use.  Simi- 
larly,  the   Ship   Control    Committee  is   able   to   view   the 


222  STATISTICAL   METHODS 

trade  situation  vessel  by  vessel,  and  as  a  whole,  and  in- 
telligently to  allocate  tonnage  between  trades,  commodities, 
and  sp(^cial  services. 

The  Division,  too,  has  closely  cooperated  with  the  War 
Trade  Board,  in  the  administration  of  the  Trading-with- 
the-Enem}^  Act,  and  in  the  collection  and  preparation  of 
shipping  data  on  foreign  countries  as  a  basis  for  negotiating 
and  administering  trade  and  shipping  agreements. 

VI.    The  Division's  Activities  Illustrated  hy  Periodic 
and  Special  Reports 

The  scope  of  the  Division's  activities  may  be  further 
illustrated  by  listing  a  few  of  the  many  subjects  covered 
in  its  statistical  reports  and  memoranda. 

1.  Employment  of  United  States  vessels  and  foreign 
vessels  controlled  by  the  United  States  by  type,  form  of 
control,  by  trade  use,  and  assignment. 

2.  Private  and  public  charter  control  of  foreign  ves- 
sels and  vessels  under  agreement  with  the  Shipping  Board. 

3.  Utilization  by  space  and  weight  factors  of  vessels 
arriving  and  clearing  foreign. 

4.  Merchant  marine  of  the  American  and  the  principal 
foreign  countries  1914  to  date,  showing  the  losses  and  gains 
by  causes,  trade  distribution,  and  movement. 

5.  Internment  and  seizures  of  enemy  vessels  in  Ameri- 
can and  foreign  ports. 

6.  Trade  of  English  controlled  tonnage  between  South 
America  and  England ;  between  Australasia  and  all  parts 
of  the  world,  between  Africa  and  northern  Europe. 

7.  American  coastwise  vessels  and  commodity  movement. 

8.  Import  and  export  distribution  of  American  and 
American  controlled  foreign  tonnage. 


COLLECTING  STATLSTICAL  DATA  223 

9.  The  employment  of  the  merchant  fleets  of  Holland, 
England,  etc. 

10.  Performance  of  vessels  built  for  the  Emergency 
Fleet  Corporation. 

The  above  reports  are  illustrative  only,  and  in  no  way 
exhaust  the  topics  upon  which  reports  are  periodically  and 
occasionally  made. 

VII.    The  Division  as  a  Repository  of  Shipping  I nformation 

A  word  should  be  said  of  the  Division  as  a  repository  of 
shipping  information.  The  significant  descriptive  facts 
of  the  merchant  marines  of  the  important  countries  of  the 
world  are  in  the  files  of  the  Division.  Moreover,  they  con- 
tain for  practically  all  American  vessels,  and  for  foreign 
vessels  controlled  by  the  United  States,  the  itineraries  from 
April  1  to  date,  adjusted  to  a  graphic  scale,  distinguishing 
time  in  port  for  ports  of  entry  and  clearance,  time  at  sea, 
cargo  carried,  and  space  and  weight  utilized.  Similar  facts, 
but  less  complete,  are  available  for  practically  the  entire 
merchant  fleet  of  the  world,  from  June,  1918,  to  date,  whether 
trading  with  the  United  States  or  not.  The  cargo,  bunker, 
and  master's  reports  contain  basic  data  for  far  more  com- 
plete studies  on  turn-arounds,  loading,  ballast  movement, 
port  delays,  etc.,  than  it  has  been  possible  to  make  during 
the  war.  It  is  the  hope  of  the  writer  that  these  data,  which 
have  been  of  distinct  service  in  the  control  and  utilization 
of  our  merchant  fleet  during  the  war,  will  be  more  fully 
utilized  for  the  development  of  shipping  facts  vital  to  the 
peaceful  prosecution  of  trade. 

VIII.    The  Division  in  Peace  Times 

Concerning  the  peace  functions  of  the  Division,  a  word  is 
necessary. 


224  STATISTICAL   METHODS 

Changes  in  source  material  and  in  methods  will  be  nec- 
essary in  order  for  the  Division  to  retain  during  peace  its 
distinctive  and  unique  character.  These  changes  must 
be  made  in  the  same  thoughtful  manner  that  was  used  in 
placing  the  Division  on  a  war  basis.  There  is  room  among 
the  present  trade  and  commerce  bureaus  of  the  Government 
for  a  Division  of  Planning  and  Statistics  of  the  Shipping 
Board,  but  in  order  to  guarantee  against  serious  over- 
lapping of  function,  jealousies,  conflicts  of  jurisdiction, 
and  waste  of  public  money,  the  same  readiness  to  adjust 
means  to  ends  which  has  characterized  the  work  of  the 
Division  during  its  year  of  activity  must  be  adopted  by  all 
of  the  trade  bureaus  having  to  do  with  foreign  commerce 
and  shipping,  and  out  of  their  cooperative  endeavor  must 
come  a  new  alignment  of  function  and  duties  in  order  to 
guarantee  from  each  distinctive  and  unique  contributions. 

Points  to  Be  Considered  in  the  Use  and  Form 
OF  Questionnaires  ^ 

Object  of  an  Inquiry.  —  A  problem  is  half  solved  when  it 
is  clearly  stated.  Write  yourself  a  memorandum  stating 
what  action  depends  upon  having  this  information;  show 
how  the  action  hinges.  Outline  your  plan  for  translating 
the  replies  into  shape  for  decisive  action. 

Existing  Data.  ■ —  Before  starting  anything  now,  find  out 
what  has  been  done  already.  This  covers,  in  the  first  place, 
your  own  offices ;  then,  the  regular  peace  time  statistical 
offices  of  the  Government ;  third,  the  reliable  sources  of 
trade  statistics ;  and  fourth,  the  special  investigation  by 
war  agencies. 

^  Adapted  with  permission  from  Weekly  Statistical  News,  Central  Bureau 
of  Planning  and  Statistics,  Washington,  D.  C,  No.  9,  Nov.  8,  1918,  pp.  4-7. 


COLLECTING  STATISTICAL  DATA  225 

1.  Standard  Size.  —  So  far  as  possible  use  8^X11  paper 
(or  multiples  of  this  size,  if  necessary).  This  will  not  only 
be  most  convenient  to  file  but  also  will  enable  the  respondent 
to  expand  the  report  when  necessary  by  adding  extra  sheets 
of  commercial  size  typewriting  paper.  Occasionally  it  will 
be  desirable  to  use  a  small  card  which  can  be  filed  directly 
in  a  card  catalogue.  This  device  should  be  used  only  after 
very  careful  consideration  of  all  the  limiting  factors.  The 
fihng  equipment  to  be  used  must  be  considered,  also,  the 
arrangement  in  the  files  and  the  arrangement  on  the  card 
adjusted  to  facilitate  filing  and  finding,  etc. 

2.  Medium  Weight  Stock.  —  When  questionnaires  are 
printed,  a  medium  weight  paper  should  be  used.  It  should 
be  heavy  enough  to  handle  easily  and  to  stand  well  in  the 
files. 

3.  Watermark.  —  Prefer  a  paper  without  a  watermark, 
so  that  blue  prints  may  be  made  directly  from  the  original 
should  it  become  desirable. 

4.  Typography.  —  Forms  should  be  printed  rather  than 
mimeographed,  except  in  emergencies. 

5.  Separate  Sheets.  —  If  the  questionnaire  covers  sev- 
eral sheets  do  not  fasten  them  together  in  a  book,  as  this 
makes  it  difficult,  if  not  impossible,  to  utiHze  the  typewriter 
and  the  carbon  paper  process  of  manifolding. 

6.  Binding  Margin.  —  Leave  a  sufficient  margin  for 
binding,  preferal^ly  at  the  side,  but  at  the  top  when  wide 
tabular  arrangements  are  necessary. 

7.  Title.  —  Each  questionnaire  should  have  a  distinc- 
tive title,  which  should  be  as  brief  as  possible,  to  facilitate 
reference,  etc.  It  should  include  the  name  and  address 
of  the  office  issuing  the  questionnaire  and  some  indication 
of  its  scope.  Usually  the  report  should  be  as  of  a  given 
date,  or  covering  a  specified  period. 

Q 


226  STATISTICAL   METHODS 

8.  Sheet  Identification .  —  Each  sheet  should  carry  data 
adequate  to  identify  it  in  the  event  of  its  becoming  separate 
from  its  fellows,  e.g.  form  number,  name  of  respondent, 
and  date  of  report. 

9.  Pagination.  —  In  the  upper  corner  opposite  the  bind- 
ing side  of  the  sheet,  place  the  page  or  sheet  number.  If 
binding  margin  is  at  the  top,  place  page  number  at  the 
bottom. 

10.  Column  Designation.  —  Where  a  columnar  form  is 
likely  to  extend  beyond  one  page,  designate  the  columns 
by  letters  or  figures  so  that  sheets  of  plain  paper  maj^  be 
added  by  the  respondent,  using  the  letters  in  lieu  of  printed 
box  headings. 

11.  Question  Designation.  —  So  far  as  practicable  number 
or  letter  each  question  and  each  distinct  part  of  a  question 
so  as  to  abbreviate  reference  in  correspondence.  In  general, 
letter  the  columns  and  number  the  rows. 

12.  Typewriter  Limitations.  —  FaciHtate  the  use  of  the 
t5T)ewriter  by  adjusting  spaces,  etc.,  to  meet  the  limita- 
tions of  standard  typewriters.  Horizontal  lines  should  be 
one-sixth  of  an  inch  apart  or  multiples  of  that  distance. 

13.  Abbreviation.  —  So  far  as  possible  arrange  entries  which 
must  be  repeated  so  that  a  brief  identification  will  take  the 
place  of  a  long  description  in  all  entries  after  the  first. 

14.  Unit.  —  Make  sure  that  the  unit  of  every  denominate 
number  will  be  clearly  indicated  on  the  return. 

15.  Standard  Unit.  —  Whenever  possible  specify  the 
unit  to  be  used,  so  that  the  returns  can  be  tabulated  with- 
out conversion. 

16.  Common  Unit.  —  Whenever  an  entire  page,  column, 
or  line  with  several  entries  is  devoted  to  statistics  of  a  single 
denomination,  show  the  unit  once  for  all  at  the  beginning 
of  the  page,  column,  or  row. 


COLLECTING  STATISTICAL  DATA  227 

17.  Arrangement  in  Categorical  Entries.  —  Let  the  gen- 
eral precede  the  specific  ;   the  whole,  the  part ;   etc. 

18.  Position  of  Instructions.  —  If  the  instructions  are 
not  too  voluminous,  they  should  appear  each  at  the  point 
where  it  is  appUcable. 

19.  Arrangement  of  Instructions.  —  Care  should  be  taken 
to  arrange  instructions  in  the  order  of  execution. 

20.  Designation  of  Instructions.  —  When  it  is  necessary 
to  separate  instructions  from  their  related  questions,  the 
instructions  should  be  numbered  or  lettered  to  facilitate 
reference.  (N.B.  If  the  questions,  etc.,  are  numbered,  the 
instructions  should  be  lettered,  and  vice  versa.) 

21.  References  to  Instructions.  —  Insert  references  to 
specific  instruction  in  box  headings,  etc.,  when  it  is  not 
practicable  to  print  them  in  position. 

22.  Ambiguity.  —  It  is  not  enough  that  the  expressions 
used  reflect  the  picture  in  the  mind  of  the  author,  they 
should  be  such  that  the  reader  must  perforce  visualize  the 
same  picture. 

23.  Terminology.  —  So  far  as  possible  use  terms  which 
are  familiar  to  the  respondents.  Employ  standard  terms 
where  standards  have  been  fixed.  Define  all  terms  which 
otherwise  might  be  employed  or  understood  in  more  than 
one  way. 

24.  Tabular  Arrangement.  —  Frequently  a  tabular  arrange- 
ment, combining  several  questions,  not  only  saves  much  ver- 
bal repetition  in  the  questions,  but  also  makes  the  logical 
relation  clearer  and  facilitates  the  work  of  answering. 

25.  Form  of  Answer.  —  In  general,  questions  should  be 
in  the  form  best  adapted  to  facilitate  answers.  Give  pref- 
erence to  questions  which  can  be  answered  by  "Yes"  or 
"No"  or  by  a  number.  If  answers  are  to  be  given  by 
checldng  or  crossing  out  words  explain  clearly  which  prac- 


228  STATISTICAL   METHODS 

tice  is  to  be  followed.  Arrange  the  tj^pography  to  facilitate 
the  method  and  stick  to  the  one  method  throughout  the 
entire  form. 

26.  Columns.  —  If  numbers  are  to  be  entered  which  have 
to  be  added  arrange  the  questionnaire  so  that  the  numbers 
will  fall  into  columns. 

27.  Calculations.  — ^  As  a  rule,  do  not  ask  the  respondent 
to  do  arithmetic. 

28.  Estimates.  —  When  the  obtaining  of  exact  quanti- 
ties involves  great  labor,  consider  whether  estimates  can- 
not be  used  instead.  If  such  is  the  case,  state  clearly  that 
an  estimate  will  suffice. 

29.  Articulation.  —  So  far  as  possible,  make  the  ques- 
tions such  that  the  answers  must  corroborate  each  other. 

30.  Letter  of  Transmittal.  —  In  practically  all  cases  the 
questionnaire  should  be  accompanied  by  a  letter  cover- 
ing the  general  situation ;  when  the  data  requested  are  few, 
the  letter  may  be  placed  on  the  upper  half  of  the  sheet  and 
the  questionnaire  below.  In  such  cases  do  not  fail  to  in- 
close a  duplicate  for  respondent's  file. 

31.  Typography  of  Letter.  —  The  general  appearance 
of  the  letter  should  be  such  that  it  will  not  be  confused  with 
advertising  matter.  The  multigraph  is  to  be  preferred  to 
the  mimeograph  for  such  letters. 

32.  Tone  of  Letter.  —  Show  the  reason  for  requesting  the 
information  and  avoid  dictatorial  phrases. 

33.  Due  Date.  —  It  is  advisable  to  have  a  set  time  by 
which  the  return  must  be  in  the  hands  of  the  inquirer. 

34.  Duplicate  Blanks.  —  Send  all  blanks  in  duplicate,  at 
least,  so  that  the  respondent  may  retain  a  copy  in  his  files. 

35.  Return  Envelope.  —  It  is  advisable  to  inclose  a 
self-addressed  envelope.  (Use  addressograph  or  similar 
device  for  this.) 


COLLECTING  STATISTICAL  DATA  229 

REVIEW 

1.  Secure  some  sample  questionnaires  from  state,  national,  or 
local  administrative  bodies,  and  test  them  according  to  the  stand- 
ards suggested. 

2.  Which  of  the  standards  enumerated  seem  to  you  to  have 
universal  application ;  which  might  be  deviated  from  without 
serious  results? 

3.  Explain  and  illustrate  what  is  meant  by  point  17. 

4.  Work  out  alternative  methods,  as  suggested  in  point  18,  of 
arranging  several  questions. 

Editing  of  Schedules  ^ 

Editing  is  a  process  preliminary  to  tabulation.  It  does 
not  necessarily  imply  inaccuracies  in  the  schedule  returns,  al- 
though inaccuracies,  some  of  which  can  be  corrected  by  the 
editor,  will  generally  be  discovered  in  the  process  of  editing, 
and  in  some  classes  of  schedules  as,  for  example,  in  those 
making  returns  of  financial  statistics  of  corporations  or  mu- 
nicipalities, the  correction  of  errors  by  editing  may  materially 
affect  the  results  of  the  tabulation.  Schedule  editing  is, 
nevetheless,  even  in  the  exceptional  cases  noted,  primarily 
formal  rather  than  corrective,  since  the  schedule  data  are 
original,  and  are  not  subject  to  material  revision  where  the 
several  replies  are  consistent  with  one  another,  except  by  re- 
ferring the  schedule  back  to  the  enumerating  agency,  or  by 
initiating  a  new  enumeration. 

The  general  purposes  of  schedule  editing  are  to  insure,  in 
as  high  a  degree  as  possible,  (1)  accuracy,  (2)  consistency, 
(3)  uniformity,  and  (4)  completeness  in  the  schedule  returns. 

L   Accuracy 

Certain  repUes  may  raise  a  presumption  of  error,  and  in 
some  cases  this  presumption  may  be  sufficient  to  warrant 

'  Adapted  with  permission  from  Bailey,  W.  B.,  and  Cummings,  John, 
Statistics,  A.  C.  McClurg  and  Co.,  Chicago,  1917,  pp.  17-25. 


230  STATISTICAL   METHODS 

investigation  and  verification.  .  .  .  Schedules,  or  copies  of 
schedules,  collected  by  mail  from  manufacturing  establish- 
ments or  public  service  corporations  or  steam  railways,  after 
examination  in  the  central  office,  are  frequently  returned  to 
the  reporting  agencies  for  correction,  or  letters  of  inquiry 
covering  certain  points  in  the  schedule  are  sent  out  calling  for 
correct  data. 

Generally,  however,  the  editor  must  accept  the  schedule 
as  it  is  presented  to  him  without  further  reference  to  the 
enumerating  or  reporting  agencies. 

When  inconsistent  or  impossible  replies  have  been  entered 
upon  the  schedule  as  finally  accepted  by  the  central  office, 
it  must  be  edited  into  consistency ;  since  the  process  of  tabu- 
lation, which  follows  editing,  exacts  absolute  consistency 
from  each  schedule.  This  editing  for  consistency  may  be 
regarded  as  being  in  a  sense  corrective,  but  it  is  so  only  in  a 
very  limited  and  special  sense,  since  the  scope  of  the  editor's 
authority  to  revise  replies  is  defined  in  the  schedule  itself. 
All  schedule  replies  are  equally  original,  and  the  only  evidence 
competent  to  justify  the  revision  of  one  reply  is  the  evidence 
presented  in  other  replies.  In  editing  for  consistency  the 
editor  makes  such  changes  only  as  the  schedule  itself  demands, 
and  he  exercises  judgment  only  in  determining  which  of  two 
or  more  inconsistent  replies  shall  be  accepted  as  correct.  Al- 
though in  some  cases  it  may  be  impossible  to  determine  .with 
absolute  certainty  which  reply  is  correct,  generally  it  is  true 
that  a  strong  probabihty  of  correctness  attaches  to  one  reply, 
and  there  is  the  further  possibility,  in  cases  where  no  prob- 
ability of  correctness  attaches  to  one  reply  rather  than  the 
other,  of  editing  the  inconsistent  replies  into  the  "no  report" 
class. 

It  is  extremely  important  that  the  editor  should  understand 
and  observe  strictly  the  limits  upon  his  authority  to  make 


COLLECTING  STATISTICAL  DATA  231 

changes  in  the  schedule,  and  it  should  perhaps  be  noted  as  a 
minor  detail,  first,  that  the  editor  should  never  make  any 
erasures  on  the  schedule  which  will  obliterate  the  original 
return,  and,  secondly,  that  all  revisions  should  be  made  in 
a  distinctive  ink,  so  that  the  work  of  the  editor  will  always  be 
perfectly  apparent,  since  the  work  of  the  editor  itself  may 
be  subject  to  revision  and  should  in  any  case  be  perfectly 
distinguishable  upon  the  schedule. 

Errors  subject  to  editorial  correction  in  returns  of  financial 
or  accounting  statistics  arise  chiefly  from  misunderstandings 
on  the  part  of  those  filling  out  the  schedule,  or  from  failure 
to  make  correct  classifications  of  returns  of  income  and  ex- 
penditure in  constructing  balance  sheets  and  in  making  up 
financial  statements.  Different  practices  of  accounting  in 
different  concerns  and  in  different  municipalities  must  be 
reconciled  so  far  as  possible  by  editing.  In  order  to  avoid  this 
difficulty  the  Interstate  Commerce  Commission  has  found 
it  necessary  to  impose  upon  railroad  and  other  corporations 
subject  to  its  jurisdiction,  uniform  systems  of  accounting, 
prescribing  in  detail  the  accounts  that  shall  be  kept  and  de- 
fining precisely  all  items  that  shall  enter  into  the  capital  ac- 
counts and  into  the  income  accounts.  These  orders  of  the 
Commission,  which  have  been  elaborated  and  promulgated 
from  time  to  time  during  the  past  two  decades,  have  been 
absolutely  essential  as  a  means  of  bringing  in  to  the  Com- 
mission in  the  annual  reports  from  the  railroad  offices,  data 
which  were  susceptible  of  tabulation.  Prior  to  this  action 
on  the  part  of  the  Federal  Commission,  the  various  state 
railroad  commissions  had  published  the  reports  of  the  rail- 
roads, practically  in  the  form  in  which  they  were  made  up 
in  the  several  railroad  offices,  and  these  reports  were  so  va- 
rious in  character  that  compilations  of  value  could  not  be  made 
from   them.     Where  uniform  systems  of  accounting  have 


232  STATISTICAL   METHODS 

not  been  imposed  upon  corporations,  schedule  returns  of 
financial  data  may  require  considerable  editing. 

2.    Consistency 

In  editing  for  consistency,  the  first  step  is  to  determine 
upon  a  method  of  procedure  to  be  followed  in  examining  each 
schedule.  Efficient  and  complete  editing  involves  the  sys- 
tematic examination  of  all  related  replies  in  a  predetermined 
order  of  examination.  This  sort  of  editing  is,  of  course,  im- 
possible where  the  replies  are  absolutely  unrelated  to  one 
another,  and  it  is  impossible  as  between  unrelated  inquiries 
on  any  schedule.  It  is,  for  example,  impossible  on  a  popu- 
lation schedule  to  check  the  age  return  against  the  sex 
return,  or  to  check  the  return  of  nativity  or  of  country  of 
birth  against  the  return  of  marital  condition.  But  many 
inquiries  are  more  or  less  interrelated,  and  in  such  cases  the 
reply  to  one  inquiry  determines  within  certain  limits  the  re- 
plies to  other  inquiries.  Marital  condition,  for  example, 
may  carry  certain  implication  as  to  age,  since  practically  all 
married,  widowed,  or  divorced  persons  are  fifteen  years  of 
age  or  older.  A  native  ob\aously  cannot  have  been  born  in 
a  foreign  country  —  although  children  born  of  American 
citizens  living  abroad  have  been  classified  as  natives  of  the 
United  States  in  order  to  avoid  too  great  detail  of  tabulation. 

Totals  which  are  inconsistent  with  constituent  items  shown 
may  be  entered  upon  a  schedule,  as  in  the  case  of  detail 
of  income  and  expenditure  which  does  not  check  up  with 
the  statement  of  total  income  and  expenditure  ;  or  of  detail 
regarding  individuals  in  a  family  where  the  total  number  in 
the  famil}^,  as  stated,  does  not  correspond  with  the  number 
of  individuals  for  which  returns  are  made  ;  or  where  a  family 
budget  is  incorrectly  totaled  and  balanced. 


COLLECTING  STATISTICAL  DATA  233 

Generally  inconsistencies  are  evidence  of  carelessness  on 
the  part  of  the  enumerator,  or  of  misunderstanding  or  ignor- 
ance on  the  part  of  the  person  fiUing  out  the  schedule. 

In  some  cases  the  inconsistency  is  not  absolute,  but  is  of 
such  a  nature  as  to  make  the  return  highly  improbable.  The 
return  of  certain  gainful  occupations  in  the  case  of  women 
and  young  children,  for  example,  while  it  may  be  highly  im- 
probable, may  be  nevertheless  within  the  range  of  possibility. 
It  is  highly  improbable,  but  not  impossible,  that  a  child  under 
fourteen  years  of  age  is  or  has  been  married.  Generally,  if 
the  return  is  within  the  range  of  reasonable  possibility,  it 
must  be  accepted  as  correct  unless  it  can  be  corrected  by 
some  other  related  reply.  The  return  that  a  person  was  the 
head  of  a  family,  and  was  employed  in  some  gainful  occupa- 
tion, together  with  other  detail  on  the  schedule  might  in 
some  cases  justify  editing  an  inconsistent  age  return  as  "age 
unknown"  on  the  strong  probability  that  an  error  had  been 
made  in  recording  the  age,  possibly  by  omitting  one  figure 
in  writing  the  age,  as  in  recording  a  person  of  the  age  twenty 
years,  as  of  the  age  two  years. 

Inconsistencies  are  not  always  apparent  upon  examina- 
tion of  individual  schedules.  Replies,  which  upon  examina- 
tion of  indi\adual  schedules  appear  merely  in  some  degree 
exceptional  or  somewhat  improbable,  may  develop  a  high 
degree  of  improbability  in  the  process  of  tabulation.  One 
instance  of  this  sort  may  be  cited.  At  the  census  of  1900, 
it  was  found  upon  tabulating  the  returns  that  the  number  of 
Negroes  returned  as  "unable  to  speak  English"  was  so  large 
as  to  be  highly  improbable.  This  return  could  not  be  edited 
out  of  the  schedules,  because  it  was  entirely  possible  that 
any  given  Negro  might  be  unable  to  speak  English,  but  it 
was  exceedingly  improbable  that  the  number  unable  to  speak 
English  should  be  so  great  as  developed  upon  tabulation  of 


234  STATISTICAL   METHODS 

the  returns.  Upon  examinations  of  the  schedule  used  at 
this  census,  the  probable  explanation  of  the  erroneous 
returns  became  apparent.  In  contiguous  columns  the  sched- 
ule called  for  answers  to  the  inquiries  as  to  the  person's 
ability  to  read  and  to  write  and  to  speak  Enghsh.  In  the 
case  of  whites,  the  usual  and  correct  retm*n  to  these  inquiries 
necessitated  writing  "  Yes,  Yes,  Yes,"  and  in  some  cases  it 
was  "No,  No,  No."  In  the  case  of  many  iUiterate  Negroes, 
the  enumerators  made  the  partially  incorrect  return  "No, 
No,  No,"  instead  of  the  correct  return  "No,  No,  Yes."  In 
consequence  of  this  accidental  arrangement  of  columns  on 
the  schedule,  the  tabulation  relating  to  abilitj^  to  speak  Eng- 
lish for  the  Negro  element  had  to  be  abandoned.  At  the 
Thirteenth  Census  the  columns  of  the  population  schedule 
were  rearranged,  and  much  more  accurate  returns  were  se- 
cured to  this  inquirj^ 

In  the  construction  of  schedules  it  is  sometimes  advisable 
to  introduce  overlapping,  or  even  duplicating  inquiries,  in 
order  to  provide  checks  for  important  inquiries,  where  the 
chance  of  error  is  considerable,  as  in  the  case  where  the  in- 
quiry calling  for  age  is  duplicated  by  an  inquiry  calling  for 
date  of  birth.  Inconsistent  rephes  to  such  inquiries  must  be 
edited  out  by  examination  of  other  replies,  or  by  an  ar- 
bitrary selection  of  one  reply  as  being  correct.  This  pro- 
cedure is,  however,  seldom  justifiable,  since  the  disadvantages 
of  complicating  the  schedule  more  than  offset  any  gain  in 
accuracy  in  the  case  of  individual  schedules. 

3.    Uniformity 

Editing  for  uniformity  is  required  where  replies,  in  them- 
selves correct,  are  variously  stated.  Editing  of  occupational 
returns  is  largelj^  of  this  character.  A  given  occupation  may 
be  designated  variously  in  different  sections  of  the  country, 


COLLECTING  STATISTICAL  DATA  235 

or  it  may  be  variously  returned  from  each  section  of  the 
country.  The  return  may,  of  course,  be  vague  and  indeter- 
minate, as  where  a  person  is  returned  as  a  "clerk"  or  a  "me- 
chanic" or  an  "engineer"  or  an  "artist"  or  an  "operative." 

In  every  case  it  is  necessary  to  determine  upon  oc('ui)ational 
designations  which  will  consistently  group  the  returns  for 
tabulation.  Moreover,  since  the  number  of  occupational 
employments  returned  in  any  extensive  inquiry  may  amount 
to  several  thousand  —  at  the  Thirteenth  Census  some  9000 
different  employments  were  distinguished  —  and  since  many 
of  these  employments  are  each  of  them  common  to  many 
different  industries,  and  since  occupational  returns  are  fre- 
quently tabulated  by  industry  as  well  as  by  occupation,  some 
scheme  of  arbitrary  symbols  must  generally  be  devised  for 
editing  the  occupational  returns  into  uniformity  for  tabula- 
tion. Commonly,  the  industry  and  the  employment  returned 
are  designated  l)y  a  simple  combination  of  letters  and  figures, 
new  symbols  being  assigned  to  each  new  employment  dis- 
covered in  the  process  of  editing.  The  tabulation  is  then 
made  mechanicallj^  from  the  symbols  which  have  been  edited 
on  the  schedules,  in  any  combination  that  seems  advisable 
when  the  editing  has  been  completed.  After  tabulation  the 
occupational  designation  is  substituted  for  the  symbol. 

A  minor  instance  of  editing  for  uniformity  is  found  in  the 
rounding  out  of  numbers  to  be  stated  in  hundreds  or  thou- 
sands, instead  of  units,  or  in  full  units  instead  of  in  fractions 
of  a  unit.  This  is  done  where  the  character  of  the  data  does 
not  warrant  a  statement  varying  by  small  units,  or  fractions. 

4.    Completeness 

Editing  for  completeness  also  is  formal  rather  than  cor- 
rective. This  sort  of  editing  may  consist  either  in  entering 
upon  the  schedule  derivative  data,  or  in  entering  repHes  to 


236  STATISTICAL   METHODS 

inquiries  which  have  not  been  answered.  Not  infrequently, 
especially  in  schedules  calling  for  financial  data,  percentages 
or  other  derived  figures  are  required  for  tabulation  which  are 
not  specifically  called  for  in  the  schedule.  These  must  be 
computed  in  the  statistical  office  and  edited  on  the  schedule. 
On  the  other  hand,  replies  called  for  by  the  schedule  may  be 
omitted,  and  these  must  be  supplied,  since  for  purposes  of 
tabulation  a  definite  reply  must  be  entered  on  the  schedule 
for  every  inquiry  calling  for  a  reply.  Where  no  specific  reply 
is  indicated  l\v  other  data  on  the  schedule,  the  reply  edited 
in  must  be  "  no  report,"  "unknown,"  or  some  similar  entry. 

REVIEW 

1.  Do  you  agree  that  editing  is  a  process  always  preliminary  to 
tabulation?  Is  not  tabulation  often  involved  in  schedule  making 
or  in  securing  answers  to  schedules?  How  do  you  then  support 
the  contention  of  the  writer? 

2.  What  are  the  steps  involved  in  editing?  Do  they  necessarily 
follow  the  order  given  by  the  writer  ?     Why  ? 

3.  Contrast  accuracy  and  consistency,  as  developed  by  the 
writer.  Are  the  terms  used  interchangeably?  Do  they  involve 
the  same  idea?  Might  the  data  be  consistent  but  the  editor  of 
the  data  be  inconsistent  in  editing  them?  How  is  the  latter  con- 
dition to  be  guarded  against  ? 

4.  Contrast  accuracy,  consistency,  and  uniformity. 

5.  What  does  the  ^vriter  mean  by  saying  that  "editing  for  com- 
pleteness also  is  formal  rather  than  corrective"? 

Questionnaire  Relating  to  the  Distribution,  Owner- 
ship, Operation,  and  Physical  Characteristics  of 
Saloons,  Prepared  by  the  Chicago  Commission  on 
the  Liquor  Problem. 

1.    Give  the  name  of  owner  of  each  saloon  doing  business 
at  present,  with  address  and  police  precinct. 


COLLECTING  STATISTICAL  DATA  237 

2.  State  whether  the  saloon  is  controlled  by  a  brewery, 
by  reason  of  the  brewery  owning  license  to  such  saloon. 

3.  State  the  license  record  or  history  of  each  saloonkeeper, 
that  is  if  such  saloonkeeper  has  ever  been  in  trouble  or  re- 
ported for  violating  the  law ;  whether  warnings  have  been 
given  to  such  saloonkeeper  with  respect  to  violations  or  mis- 
conduct ;  if  the  license  of  the  saloonkeeper  has  ever  been 
revoked  for  cause ;  if  ever  convicted  and  fined  for  breaking 
the  law ;  and  other  information  of  this  nature. 

4.  State  who  actually  operates  and  conducts  such  saloon, 
that  is,  is  the  man  who  actually  operates  and  conducts  the 
saloon  the  real  owner  or  merely  the  agent  or  employee  of 
some  other  person  or  party  who  holds  or  owns  the  license? 

5.  Give  the  name  of  person  appearing  on  the  city  license 
for  each  saloon. 

6.  State  number  of  employees  of  each  saloon,  the  nature 
of  the  occupation  of  such  employees,  that  is,  whether  em- 
ployed as  bartender,  porter,  and  the  like,  and  give  name  and 
address  of  each  employee. 

7.  State  whether  the  government  liquor  license  is  in  the 
name  of  one  person  or  corporation,  and  whether  the  city 
liquor  license  is  in  the  name  of  another  person  or  corporation. 

8.  State  whether  fixtures  in  saloon,  as  well  as  lease  to  the 
premises,  are  owned  by  the  holder  of  the  hcense,  or  by  the 
person  actually  operating  the  saloon,  or  by  the  brewery. 

9.  State  whether  partitions,  stalls,  private  winerooms, 
or  palm  and  picnic  gardens  are  permitted  in  and  about  the 
premises  of  the  saloon. 

10.  State  whether  dances  are  permitted  to  be  held  in 
the  rear  rooms  of  each  saloon  or  in  any  other  portion  of  the 
building  in  which  such  saloon  is  located. 

11.  State  whether  the  saloon  is  within  250  feet  of  a  public 
or  private  school,  church,  or  any  public  institution. 


238  STATISTICAL   METHODS 

12.  State  whether  the  saloon  has  direct  connection  with 
hotels,  bedrooms,  or  other  private  rooms,  whether  in  the  rear, 
side  of  the  saloon,  or  overhead. 

13.  State  whether  the  front,  side,  and  rear  entrances  and 
exits  to  the  saloon  open  into  a  street,  alley,  yard,  or  other  open 
grounds,  or  otherwise. 

14.  State  whether  the  saloon  has  a  cabaret,  music,  or  other 
form  of  amusement  in  or  about  the  premises. 

15.  Give  other  facts  regarding  conditions  in  saloons  not 
noted  above. 

REVIEW  PROBLEMS 

1.  Criticize  the  general  form  of  this  questionnaire. 

2.  Using  sections  9,  10,  12,  13,  and  14,  and  following  the  instruc- 
tions in  the  Text  and  in  Points  to  he  Considered  in  the  Use  and  Form 
of  a  Questionnaire,  arrange  them  in  the  form  of  a  questionnaire, 
which  can  be  statistically  handled. 

REVIEW  PROBLEMS 

1.  Using  the  form  of  the  questionnaire  on  page  239,  tabulate 
the  descriptive  detail  of  the  house  in  which  you  are  living.  Work 
out,  with  the  other  members  of  the  class,  a  uniform  code  system  to 
designate  the  presence,  absence,  or  number  of  each  descriptive  detail. 

2.  Preserve  your  descriptions  for  later  use. 

REVIEW  PROBLEMS 

1.  Answer  question  3,  Section  D  of  the  schedule  on  page  240  in 
such  a  form  that  your  answer  would  be  statistically  usable  for 

(1)  medical  purposes : 

(2)  assignment  of  responsibility  as  between  the  person  injured, 
the  nature  of  the  work  done,  and  the  condition  of  the  machine 
operated. 

2.  Which,  if  any,  of  the  questions  seem  to  you  to  be  poorly 
worded  ?     Why  ? 


COLLECTING  STATISTICAL  DATA  239 

SCHEDULE  FOR  DESCRIPTION  OF  BUILDINGS  AND  THEIR  LOCATION.' 

Dist Map Blk Lot Pg Line 

Examined 1910 

By :.No 

Assistant  Assessor 

Single  house       one  side  of  double  house       one  of  row       Duplex  house 

No St. 

Ave. 

Material — Siding  drop,  lap,  shingles,  brick,  common-press,  plaster,  veneer,  stone,  cut,  rough, 
concrete  tile  T.  C.  Trimmings  —  plain,  ornamental,  stone,  cut,  rough 

T.  C,  brick,  wood.  Upon  a  foundation  of  stone,  brick,  tile,  concrete,  posts. 

Main  floor feet  above  ground. 

Dimensions —  Wide,  deep,  ivide,  deep,  wide,  deep,  wide,    deep; 

story  story 

high  wide,  deep,  wide,  deep,  wide,  deep  high 

Projections  —        One  story                            two  story  three  story  tower 

bay  window  bay  window  bay  window 

front                                          side  rear 

porch                                         porch  porch 

Roof  —  Shingles,  slate,  tile,   gravel,  composition,  tin,  copper.     Hip,  gable,  flat,  mansard, 
dormers  or  gables.  Cornice  —  plain,  ornamental,  wood,  metal,  stone,  T.C. 

Divisions  —  Basement,  cellar,   under   whole,  front,   middle,   rear  containing 

storage  water  heating  laundry  bath 

room  closet  plant  tubs 

1st  story  —  hall,  parlor,  sitting  room,  library,  diningroom,  kitchen,  bathroom,  bedroom. 

2d  story  —  bedroom,  bathroom,  other  rooms. 

3d  story  —  bedrooms,  bathroom,  other  rooms. 

4th  story  —  bedroom,  bathroom,  other  rooms.     Attic  —  rooms  finished,  unfinished. 

Inside  Finish.     Main  part,  lower  story  —  ornamental,  plain,  hardwood,  pine,  oil,  paitU, 

Upper  story  — ■  hardwood,  pine,  oil,  paint. 
Heating  —  Stoves,  furnace,  hot  water,  steam,  combination. 

Water  — ■  Open  well,  city,  in  yard,  basement,  first  story,  second  story,  third  story. 

plumbing  —  bathrooms,  water  closet,  wash  basin,  laundry  tray,  sink,  bant,  —  open, 
closed. 

Lighting  —  Gas,  Electric,  Oil.  Fixtures  —  Plain,  Ornamental. 

Drainage  —  Cesspool,  sewer.  Building  in  good,  fair,  bad,  repair. 

Vacant,  occupied,  owner,  tenant.  Rents  at  $  per  month. 

Name  of  Owner,  Agent,  Tenant. 

S Rate $ per $ square .  .  .$ foot 

Barn  —  Wood,  brick,  stone,  wide,  deep,  stories  high 

contains  stalls,  living  rooms 

Sidewalk  —  Wood,  stone,  cement,  brick,  Curb,  wood,  stone,  gra7iite 
Condition  —  good,  fair,  bad. 

Lot  Surface  —  Level,  uneven ;                              about                          feet  above,  below  grade 
Barn  $ Bill  Board 


'  Taken  from  First  Quadrennial  Assessment  of  Real  Property  of  the  City  of  Cleveland, 
1910,  p.  20 


240 


STATISTICAL   METHODS 


REPORT   OF  A  PERSONAL  INJURY   TO   AN   EMPLOYEE 

REPORT  NO.    1 

AN  ANSWER  SHOULD  BE  MADE  TO   EVERY  QUESTION 


Sec.  a. 

Employer, 

Place  and 

Time. 


Employer's  name 

Office  address :  Street  and  No 

City  or  town 

Business  (state  exact  nature) 

Location  of  plant  where  injury  occurred 

Street  and  No City  or  town  . 

Date  of  injury 

Day  of  week 

Hour  of  day 


Sec.  B. 
Insubance. 


1.  Are  you  insured  to  provide  payment  to  injiu-ed  employees  imder 

the  Workmen's  Compensation  Act? 

2.  If  so  insured,  give  name   and  business  address  of  the  insurance 

association  or  company 

3.  Has  injured  employee  given  notice  in  writing  reserving  common 

law  rights? 4.    If  so,  when? 


Sec.  C. 

Injured 

Person. 


4.    Age. 


1.  Name  of  injured  employee.  . .  , 

2.  Address 

3.  Sex 

5.  Occupation  when  injured 

6.  In  what  department  or  branch  of  work  ? 

7.  Was  this  the  regular  occupation  of  employee?  . 

8.  If  not,  state  regular  occupation 

9.  Was  injured  employee  piece  or  time  worker? .  . 
10.  Wages,  or  average  earnings  weekly 


1.  Name  of  machine,  tool,  appliance,  etc.,  in  connection  with  which 

injury  occurred 

2.  Hand  feed  or  mechanical 

3.  Describe  fully  how  injury  occurred 


Sec.  D. 
Cause. 


4.  Part  on  which  injury  occurred 

5.  Is  it  possible  to  provide  a  guard,  safet>'  appliance,  or  regulation  in 

connection  with  this  machine  that  might  have  prevented  this 
injury  ? 

6.  What  guard,  safety  appliance,  or  regulation  to  guard  against  the 

injury  was  in  use  when  it  occurred  ? 


Sec.  E. 

Nature  of 

Injury. 


I 


1.  Part  of  person  injured  (state  whether  right  or  left  in  case  of  arms 

or   hands) 

2.  Nature  of  injurj-,  as  near  as  possible 

3.  Attending  physician  or  hospital  where  sent,  name  and  address .... 

4.  State  probable  period  of  disability  (number  of  days  employee  is 

expected  to  be  absent  from  employment,  dating  from  day  of  in- 
jury   


Date  of  Report Made  out  by. 


COLLECTING  STATISTICAL  DATA  241 

3.  On  the  supposition  that  you  were  in  receipt  of  one  hundred 
schedules  of  this  type,  write  out  a  full  set  of  instructions  to  a  group 
of  clerks  for  editing  the  same. 

4.  Respecting  statistical  analysis  : 

(1)  Name  a  business  or  other  problem,  preferably  out  of  your  own 
experience,  which  can  be  studied  statistically. 

(2)  State  clearly  and  definitely  a  purpose  to  be  accomplished  in 
such  a  study  of  this  problem. 

(3)  Indicate  the  soiu*ces  of  information  to  which  you  would  go  for 
data,  indicating  the  statistical  peculiarities,  limitations,  and  virtues 
of  the  data. 

(4)  Indicate  how  the  data  would  be  selected,  collected,  or  sum- 
mated  and  what  cautions  would  have  to  be  observed  in  securing 
them. 

(5)  Define  sufficiently  for  statistical  use  the  units  of  measm-ements 
which  you  would  employ. 

(6)  Formulate  a  questionnaire  containing  sLx  questions  l>earing 
unmistakably  on  your  purpose. 


CHAPTER  V 
CLASSIFICATION  —  TABULAR   PRESENTATION 
The  Purpose  and  Method  of  Tabulation  ^ 

Nature  of  Tabulation.  —  The  general  meaning  of  the  word 
"table"  appears  to  be  an  even  flat  surface  with  breadth  not 
disproportionately  small  in  comparison  with  length  or,  con- 
cretely, an  object  characterized  by  the  possession  of  such  a 
surface.  The  arrangement  of  ordinary  reading  matter  is 
in  a  hne  or  lines,  while  a  statistical  table  presents  itseK  as  a 
surface. 

The  table  thus  differs  from  the  ordinary  page  of  letter  type 
not  merely  in  being  composed  mainly  of  figures,  but  also  in 
being  readable  in  two  dimensions,  that  is,  at  least  vertically 
as  well  as  horizontally.  "Reading  matter"  may  also  be  a 
list  of  numbers.  But  the  arrangement  of  the  line  (or  "lines ") 
of  ordinary  reading  matter  running  back  and  forth  on  the 
page  is  not  on  a  surface  plan.  A  line  of  running  print  can 
be  followed  but  one  way.  Such  a  line  is  like  a  string  of  beads, 
but  with  the  type  (as  the  beads)  interrupted  on  the  parts  of 
the  string  extending  from  right  to  left  and  in  position  on 
the  string  as  the  line  passes  from  left  to  right.  The  reader's 
eye  must  follow  the  string.  A  statistical  table,  on  the  other 
hand,  can  be  read  either  down  or  across.  It  utilizes  the  di- 
mensions of  a  surface.  According  to  this  conception,  a  list 
is  not  a  table  and  a  single  column  does  not  constitute  a  table. 

'  Adapted  vdXh.  permission  from  Watkins,  G.  P.,  "Theory  of  Statistical 
Tabulation,"  Quarterly  Publications  of  the  American  Statistical  Association, 
December,  1915,  pp.  742-757. 

242 


CLASSIFICATION  —  TABULAR  PRESENTATION      243 

A  table  may  also  sometimes  be  read  diagonally,  especially 
one  of  content  and  form  such  as  to  show  correlation.  The 
ages  of  men  and  of  their  wives,  the  age  and  the  grade  of  school 
children,  etc.,  may  conveniently  be  compared  with  reference 
to  the  most  frequent  coml:)inations  in  this  way. 

Matter  not  of  a  statistical  character  may  also  be  put  into  a 
table  when  there  is  some  advantage  in  reading  it  more  than 
one  way.  Numerical  data,  whether  statistical  in  character 
or  not,  are  frequently  best  so  arranged.  The  tabular  form 
is  used  to  furnish  data  for,  and  facilitate  the  processes  of, 
computation,  as  in  the  familiar  tables  of  logarithms,  trigono- 
metric functions,  roots,  and  powers,'etc.,  and  in  interest  tables. 
Here  compactness  of  form  and  ease  of  reference  are  the  im- 
portant considerations,  liut  these  are  also  the  reasons  for 
being  of  the  statistical  ta])lo.   .   .   . 

Statistical  tables  consist  of  numbers  representing  quanti- 
ties or  degrees  of  concrete  things,  qualities,  or  events.  Hence 
the  importance  of  statistical  units  and  of  their  definite  and 
constant  significance.  Indeed,  the  writer  would  describe 
statistics  in  general  as  concerned  with  concrete  numbers  and 
quantities  and  their  relations.  It  constitutes  a  characteristic 
method  or  methods  of  dealing  with  such  numbers,  and  also 
consists  of   the  material  appropriately  so  dealt  with  .... 

Tabular  presentation  has  conspicuous  advantages  as  re- 
gards economy  of  space  and  of  time  :  of  space,  wherever  the 
same  class  designation  or  name  is  to  be  applied  to  a  large 
number  of  items  brought  together  in  the  table  in  a  single 
line  or  a  single  colunni ;  of  time,  on  the  part  of  those  seeking 
information  on  a  specific  point,  in  that,  by  using  line  and 
column  as  guides,  the  specific  fact  sought  can  be  found  directly. 
These  uses  of  the  tabular  form  are  not  peculiar  to  numerical 
tables. 

Tabulation,  like  speech,  is  a  device  for  expressing  ideas, 


244  STATISTICAL   METHODS 

and  in  particular  for  expressing  them  compactly  and  in  a 
Avay  to  facilitate  comparison  and  show  relations.  Ordinary 
linguistic  symbols,  arable  and  other  numerical  notation  (in- 
cluding the  symbolic  use  of  position),  rulings  and  spatial 
relations,  and  sometimes  forms  special  to  tabular  notation, 
are  all  employed  for  this  purpose.  As  with  language  gen- 
erally, the  tabular  presentation  of  facts  should  say  as  much 
as  possible  with  a  meaning  as  unmistakable  as  possible  in 
as  small  a  compass  as  possible.  There  should  be  no  ambigu- 
ity; hence,  for  example,  blanks  should  mean  but  one  thing. 
Expression  should  be  as  direct  as  possible ;  hence,  for  example, 
information  essential  to  a  prompt  grasping  of  the  meaning 
of  the  table  should  not  be  put  in  footnotes  if  avoidable. 
Reasonable  conventions  regarding  the  use  of  symbols  should 
be  observed.  .  .  . 

Uses  of  a  Statistical  Table.  —  The  stub  of  a  statistical  table 
is  most  conmonly  a  geographical  classification.  For  groups 
of  such  classes  there  will  usually  be  sub-totals  which  condense 
the  more  detailed  classification.  But  the  stub  may  consist 
of  the  names  of  reporting  entities,  as  in  the  case  of  many  pri- 
mary tables  of  corporation  and  financial  statistics.  The 
most  important  statistical  data  for  public-service  corporations 
are  usually  printed  in  such  form  by  the  various  supervising 
commissions,  including  the  Interstate  Commerce  Commission. 
But  for  much  such  data,  especially  for  the  distinctively  sta- 
tistical as  opposed  to  the  financial  part,  the  company  unit 
has  little  significance  and  compilations  are  made  by  geo- 
gi'aphical  or  other  groups  of  companies.  Where  the  facts  are 
presented  by  reporting  entities,  the  tabular  form  may  serve 
the  purpose  merely  of  saving  space,  but  the  totals,  which  are 
of  more  statistical  interest,  are  best  obtained,  and  their  com- 
position best  shown,  by  way  of  a  table.  If  it  were  possible 
to  provide  the  necessary  space,  it  would  of  course  be  best 


CLASSTFK^ATION  —  TABULAR  PRESENTATION      245 

always  to  tal)ulate  by  such  return  or  report  units,  so  that  the 
person  who  used  the  primary  data  could  make  his  own  group- 
ings and  combinations.     However,  especially  where  the  enu- 
meration or  report  unit  is  the  individual  or  the  private  family, 
aggregate  presentation  is  unavoidable.     Hence  the  stub-items 
of  a  table  represent  classes,  rarefy  also  composite  indi\dduals. 
In  publishing  statistics  of   manufacturers   and  other  private 
business  enterprises,  the  presentation  of  the  facts  for  one  or 
few  companies  by  themselves  is  expressly  avoided  as  tending 
to  reveal  the  operations  of  individual  establishments  to  com- 
petitors.    Such  procedure  on  the  part  of  the  U.  S.  Census 
Bureau  and  the  various  bureaus  of  labor  statistics  is  un- 
doubtedly Avise  administratively,  though  the  fact  that  a  large 
business  corporation  with  stock  broadly  owned  cannot  properly 
withhold  from  the  public  any  sort  of  statistical  or  financial 
data  that  is  of  general  interest  should  be  recognized  and 
doubtless  will  in  time  be  accepted  in  practice.     But  at  present 
only  quasi-public  corporations  appear  to  be  dealt  with  sta- 
tistically according  to  this  principle. 

The  statistical  interest  of  a  geographical  stub  is,  of  course, 
not  of  the  highest  rank.  The  consideration  determining  its 
use  is  the  fact  that  a  general  or  primary  table  is  in  the  first 
instance  a  record  and  repository  of  data.  Only  to  a  very 
subordinate  extent  is  it  wise  to  attempt  to  exhibit  relations 
and  significance  in  such  a  table.  In  a  derivative  (analytical 
or  text)  table  the  interest  is  of  course  different.  But  the 
arrangement  of  the  items  even  of  a  geographical  stub  may 
be  made  to  serve  the  purpose  of  explanation  where,  for  ex- 
ample, the  order  of  magnitude  or  of  density  is  followed.  In 
the  New  York  First  District  Public  Serxdce  Commission  re- 
ports, the  arrangement  of  lighting  companies  within  groups 
determined  by  intercorporate  relations  in  the  order  of  size 
(amount  of  revenues)  somewhat  increases  the  statistical  in- 


246  STATISTICAL   METHODS 

terest  of  the  stub,  since  it  is  a  step  towards  making  the  table 
show  correlation.  It  also  puts  first  the  companies  in  which 
a  reader  is  likely  to  be  chiefly  interested,  thus  facilitating  ref- 
erence —  which  fact  is  doubtless  of  more  practical  importance 
than  the  slight  aid  afforded  to  interpretation.  The  order 
of  the  street-railway  groups  of  companies  in  the  same  series 
of  reports  is  in  a  general  way  that  of  expensiveness  of  line 
construction.  These  touches  of  correlational  arrangement 
are  suggestive  of  a  use  of  tabulation  which  seldom  affects 
primary  tables.  The  correlational  use,  however,  supposes 
the  captions  as  well  as  the  stub-items  arranged  according 
to  the  degree  of  some  quality,  and  thus  it  involves  cross- 
classification.  Primary  tables  ought  to  be  planned  with 
reference  to  such  possible  use.  Perhaps  the  presentation  of 
such  cross-classifications  might  well  take  the  place  of  some 
geographical  detail. 

A  statistical  table  is  often  merely,  and  always  incidentally, 
a  presentation  of  items  going  to  make  up  a  total  or  series 
of  totals.  The  separate  columns  may  accordingly  contain 
things  having  little  or  no  relation  to  each  other  and  they  may 
be  given  together  merely  to  save  space  by  making  unnecessary 
the  repetition  of  the  stub.  The  unity  of  a  table,  however, 
will  usually  mean  more  than  this.  But  it  is  doubtless  the 
first  or  simplest  purpose  of  a  table  to  show  this  or  that  aggre- 
gate and  how  it  is  made  up.  The  stub-items  constitute  the 
individual  or  class  names  for  the  things  of  which  the  numbers 
are  the  entries.  The  entries  are  themselves  usually  aggregates. 
But  it  is  possible  to  use  the  tabular  form  for  a  mere  tally 
sheet,  in  which  case  the  entries  represent  the  individual  things. 

In  general  the  stub-item  of  a  statistical  table  stands  for 
a  group  or  class  of  things,  and  the  stub  contains  the  terms  of 
a  classification.  Classifications  in  statistics,  it  should  be 
noted,  must  be  comprehensive,  hence  there  is  usually  need 


CLASSIFICATION  —  TABULAR  PRESENTATION      247 

of  an  "other"  or  "miscellaneous"  class,  and  commonly  also 
of  an  "unknown"  or  "not  specified"  class.  For  the  rest, 
all  the  principles  conducive  to  right  classification  apply  to 
stub  and  caption  classifications. 

It  is  above  implied  that  the  captions,  also,  as  well  as  the 
stub-items,  ^dll  usually  constitute  a  classification,  or  per- 
haps more  than  one  classification.  The  fact  that  columns 
commonly  add  across  to  a  total  column  supposes  this  situ- 
ation. The  statistical  table  thus  becomes  a  mode  of  cross- 
classification. 

In  this  more  highly  evolved  use  of  the  tabular  form,  a 
statistical  table  is  essentially  an  arrangement  of  numerical 
data  by  which  the  data  are  cross-classified  according  to  two 
sets  of  terms,  those  of  the  stub  and  those  of  the  captions. 
The  device  of  sub-classification  is  also  frequently  introduced 
in  the  captions  and  stub  by  way  of  compound  captions,  sub- 
division of  stub-items,  and  sub-totals.  The  more  complicated 
classifications  usually  require  additional  tables  in  series. 

Instead  of  the  terms  of  a  classification,  a  time  series,  espe- 
cially a  succession  of  years,  may  be  used  in  the  stub  and  have 
much  the  same  relation  to  the  entries,  except  that  column 
totals  are  then  not  always  significant.  But  such  a  table  is 
usually  derivative. 

Limitations  wpon  Tabular  Presentation.  —  Cross-classifi- 
cation corresponds  to  what  is  known  in  algebra  as  combina- 
tion and  is  covered  under  the  topic,  "Permutations  and 
Combinations."  The  mathematical  principle  is  that  the 
number  of  possible  different  combinations  of  one  set  of 
things  or  classes  of  things  (enumerated  in  the  stub-items,  let 
us  say)  with  another  set  (enumerated  and  described  in  the 
captions)  is  equal  to  the  product  of  the  number  of  items  in 
each  set.  This  gives  the  number  of  cross-classes  or  entry- 
places  in  the  table.     There  should  be  occasion  to  use  most 


248  STATISTICAL  METHODS 

of  these,  or  else  the  form  of  the  table  needs  revision,  or  at 
least  condensation. 

The  fact  that  cross-classification  is  a  process  of  combination 
serves  to  bring  out  an  important  limitation  upon  the  possi- 
bilities of  tabular  presentation.  It  is  often  desirable  to  show 
the  associations  or  combinations  of  the  units  under  three 
classifications  or  sets  of  cases.  If  the  third  of  these  classifica- 
tions is  merel}^  twofold,  the  space  required  is  merely  double 
what  it  was  before.  If  there  are  12  rubrics  under  the  third 
classification,  the  normal  requirement  is  for  12  times  as  much 
place,  or  probably  13  times  as  much,  since  a  total  of  the  12 
classes  will  be  desirable.  If  the  original  stub  provides  for 
30  items  and  there  are  10  columns,  a  presentation  of  all  the 
possible  combinations  with  a  further  series  of  12  classes  will 
require  30X10X12,  or  3600  cross-classes  or  entry-places. 

If  it  is  desired  to  show  completely  by  tabulation  the  re- 
lations between  nativity  in  12  classes,  age  in  10  classes,  sex 
in  2  classes,  residence  in  50  classes,  and  occupation  in  100 
classes,  supposing  every  possible  combination  will  require 
an  entry-place,  the  number  of  cross-classes  will  be  12X10X2 
X  50X100,  or  1,200,000.  If  the  50  residence  rubrics  are 
made  the  items  of  the  stub  and  10  columns  may  be  put  on 
a  page,  that  would  mean  500  entry-places  to  a  page.  The 
presentation  of  the  facts  would,  therefore,  require  2400 
pages.  But  the  number  of  rubrics  under  each  classification 
is  fewer  than  it  might  be  desirable  to  use.  The  above  com- 
putation, moreover,  does  not  provide  for  totals.  Of  course, 
much  space  could  in  practice  be  saved  by  reason  of  the  omis- 
sion of  provision  for  impossible  or  infrequent  combinations. 
Young  children,  for  example,  will  not  be  found  in  occupa- 
tions. However,  the  limitations  upon  what  we  may  call 
complete  tahulation  are  evident.  The  size  of  census  volumes, 
even  with  their  limitations,  is  thus  explained. 


CLASSIFICATION  —  TABULAR  PRESENTATION      249 

The  difficulty  in  question  is  avoided  by  seldom  attempting 
complete  tabulation.  Some  of  the  combinations  are  not 
important  or  not  of  special  interest.  The  classification  of 
those  in  a  specific  occupation  by  nativity,  for  example,  is  of 
interest  for  comparatively  few  occupations  and  comparatively 
few  locahties.  It  may  often  be  assumed  that  the  variation 
within  one  kind  of  classification  in  terras  of  another  classi- 
fication will  be  so  small  that  a  presentation  of  the  facts  for 
all  of  the  first  class  combined  will  sufficiently  meet  ordinary 
statistical  requirements.  Detailed  compilations  also  may 
often  be  made  to  serve  for  a  number  of  years,  provided  the 
proportions  found  are  representative  and  quite  constant. 
The  frequent  necessity  of  resorting  to  such  methods  —  the 
necessity  in  particular  of  using  alternative  classification  in- 
stead of  cross-classification  —  explains  why  a  given  statistical 
compilation  will  seldom  enal)le  one  to  answer  all  the  questions 
for  which  a  solution  is  sought.  The  facts  are  contained  in 
the  returns'  but  they  cannot  all  be  presented. 

A  report  schedule  from  which  tabulations  are  made  is 
commonly  itself  in  tabular  form  and  may  contain  a  cross- 
classification.  Only  one  who  has  had  practical  experience 
with  the  problem  of  devising  a  general  table  or  tables  to  con- 
tain what  is  most  important  in  such  returns  can  appreciate  the 
difficulty  of  obtaining  satisfactory  results  in  a  limited  space. 
But  the  reader  is  prepared  for  an  application  of  the  theory 
of  mathematical  combinations  to  such  a  case.  If  only  50 
such  report  schedules  are  to  be  tabulated  in  a  way  to  show 
the  individual  returns  and  supposing  the  schedule  has  10 
stub-items  and  20  captions,  then  in  order  to  present  all  the 
facts  it  would  be  necessary  to  provide  at  least  200  columns 
of  50-line  tabular  matter.  Alternative  tabulation,  on  the 
other  hand,  which  would  utilize  only  the  cross  and  down 
totals  of  the  schedule,   would  require  30  columns.     It  is 


250  STATISTICAL   METHODS 

assumed,  of  course,  that  the  data  of  each  schedule  are  them- 
selves aggregates  and  that  each  such  aggregation  has  interest 
of  its  own.  If  only  the  totals  for  the  50  returns  taken  to- 
gether are  wanted,  only  as  many  entry-places  are  required 
as  are  contained  on  one  of  the  schedules,  that  is,  20X 10+31 
(for  totals),  or  231  in  all  —  which  is  a  table  of  modest  di- 
mensions. Enumeration  schedules,  it  should  be  noted,  are 
not  often  of  a  character  to  raise  this  question  in  just  this 
form.  .  .  , 

With  our  present-day  mechanical  facilities  for  "tabula- 
tion," the  process  of  subdivision  and  cross-classification  of 
aggregates  is  limited  rather  by  the  degree  of  significance  of 
the  results,  and  by  the  cost  and  awkwardness  of  voluminous 
reports,  than  by  the  time  required  to  make  the  necessary 
sortings  and  counts  of  cards  already  punched.  While  the 
mathematical  theory  of  combination  is  a  good  point  of  de- 
parture in  planning  tables,  most  combinations  of  the  terms 
of  diverse  classifications,  even  if  they  occur,  have  no  concrete 
significance. 

Comprehensiveness,  Comparability,  and  Compactness  as 
Essentials  of  Good  Statistical  Tables.  —  The  significance  of  a 
statistical  table,  as  of  statistics  generally,  depends  very  largely 
upon  its  being  comprehensive  for  the  field  it  covers.  Truth 
in  its  statistical  aspect  is  representativeness.  The  only  ab- 
solute guaranty  of  the  representative  quality  of  an  aggregate 
is  that  it  reflects  all  the  units  within  its  scope.  According 
to  the  mathematical  theory  of  probabilities,  much  less  is 
necessary,  but  this  theory  does  not  take  account  of  the  selec- 
tive tendency  of  events  and  of  observation,  for  which  the 
statistician  must  be  continually  on  his  guard.  The  point  is 
illustrated  by  the  well-known  difference  in  quality  between 
results  obtained  by  complete  enumeration  and  those  obtained 
from  a  circular  letter  or  questionnaire. 


CLASSIFICATION  —  TABULAR  PRESENTATION      251 

A  table  should  not  be  composed  of  mere  samples.  It  is 
better  to  make  it  of  narrow  scope  but  comprehensive  as  far 
as  it  goes,  i.e.  within  its  territorial  or  other  limits.  A  table, 
furthermore,  is  likely  to  be  one  of  a  series,  which  should 
all  be  on  the  same  basis,  or,  at  least,  conform  sufficiently  to 
the  basis  of  the  series  so  that  its  representative  quality  and 
the  comparability  of  its  totals  are  not  appreciably  impaired. 
The  most  surely  understood  uniform  basis,  meeting  all  the 
requirements  of  comparability,  is  the  comprehensive  basis. 
When  a  table  falls  short  of  the  basis  of  its  fellows,  but  in  a 
way  not  such  as  to  compel  its  omission  altogether,  the  appro- 
priate place  to  indicate  what  is  lacking  is  a  general  note. 
Sometimes  it  may  be  well  to  have  two  sets  of  totals  to  a  table, 
one  on  the  most  comprehensive  basis,  and  one  less  compre- 
hensive, but  such  as  to  supply  aggregates  for  data  that, 
though  falling  short  of  perfect  comprehensiveness,  may  be  of 
qualified  value  in  other  ways,  as  for  example,  in  the  computing 
of  ratios.  On  the  other  hand,  if  it  is  desirable  to  present  in- 
formation in  connection  with  only  one  of  a  series  of  tables,  it  is 
well,  in  order  to  avoid  impairing  the  comparability  of  one 
table  with  the  others  of  the  series,  to  put  the  data  that  exceed 
the  standard  scope  in  brackets  and  not  take  them  into  the 
totals,  thus  letting  them  be  in  the  table  for  purposes  of 
reference,  but  not  strictly  of  it.  Uniform  comprehensiveness 
upon  some  definable  basis  is  the  ideal  standard.  Even  a 
small  per  cent  impairment  of  comprehensiveness  may  mean 
a  large  decrease  in  tabular  efficiency. 

The  same  principle  applies  with  reference  to  corresponding 
tables  for  a  series  of  years.  While  it  is  desirable  that  new 
data  be  made  use  of,  full  notice  of  a  change  of  basis  should 
be  given  and  it  is  often  well  to  give  figures  and  make  com- 
parisons on  both  the  old  and  the  new  basis  for  the  first  year 
of  the  change.     Especially  in  derivative  tables  attention  to 


252  STATISTICAL  METHODS 

comparability  is  imperative,  without  regard  to  cost  in  the 
way  of  added  complexity,  etc.  Ratios,  for  example,  should 
usually  be  given  on  both  bases  where  there  is  a  change. 
This  again  is  a  question  of  representativeness,  though  here 
differences  between  aggregates,  rather  than  the  aggregates 
themselves,  are  under  consideration.  How  important  this 
question  is  in  another  of  its  phases  is  illustrated  by  the  place 
commonly  given  to  averages,  i.e.  representative  numbers, 
as  the  gist,  if  not  the  substance,  of  statistics. 

The  complement  of  the  requirement  of  comprehensiveness 
is  that  of  compactness.  It  is  of  the  essence  of  a  table  to  con- 
vey a  large  amount  of  information  in  a  small  space.  Hence 
sparsely  tenanted  columns  are  an  eyesore,  and  blank  columns, 
even  where  the  original  classification  may  have  reasonably 
planned  to  use  them,  should  not  l^e  tolerated.  Blank  lines 
are  hardly  less  justifiable.  Classifications  should  be  revised 
when  the  data  as  spread  out  show  such  waste  of  space.  Un- 
represented classes  may  be  disposed  of  in  the  notes.  Sparsely 
tenanted  columns  should  be  consolidated,  subdivisions  of 
entries  being  indicated  by  footnotes  if  desirable.  A  "mis- 
cellaneous" column  may  often  be  employed  with  reference 
to  such  residual  classes.  It  should  never  include  more  than 
a  small  per  cent  of  the  material  of  the  table.  But  sometimes 
the  desirability  of  keeping  up  tables  on  a  uniform  plan,  e.g. 
through  a  series  of  years,  may  justify  continuing  sparse 
columns  till  a  comprehensive  overhauling  of  the  form  of 
tables  is  undertaken. 

The  table  must  ordinarily  be  planned  with  reference 
to  fitting  the  printed  page,  as  single-page  lengthwise,  single- 
page  upright,  twin  upright,  or  as  a  series  of  such.  Hence 
dimensions  in  terms  of  columns  and  lines  must  often  be 
carefully  studied  before  being  finally  fixed.  The  large  page 
and  the  resulting  unwieldy  size  of  most  statistical  volumes 


CLASSIFICATION  —  TABULAR  PRESENTATION      253 

are  clue  to  the  need  of  space  for  manoeuvering  the  tabular 
matter.  Often  the  presentation  in  sections  of  what  is  func- 
tionally one  table  becomes  necessary. 

General  Tables  and  Derivative  Tables  Distinguished.  —  A 
table  serving  primarily  the  purpose  of  a  repository  of  com- 
prehensive statistical  data  is  distinguished  as  a  general  table, 
also,  with  reference  to  its  being  closest  to  the  original  data, 
as  a  primary  table. 

Derivative  tables  are  summaries  and  auxiliary  ratio  tables. 
They  may  be  usually  distinguished  as  text  or  analysis  tables. 
But  some  ratio  tables,  or  at  least  some  ratios,  are  often  in- 
cluded among  general  tables.  Derivative  tables  are  based 
upon  general  tables  and  contain  matter  suitable  for  incorpo- 
ration in  analysis.  They  may  vary  in  form  from  year  to  year 
according  to  the  exigencies  of  the  situation  and  according  to 
the  points  emphasized  in  the  text.  Unlike  the  general  tables 
they  will  usually  contain  data  and  comparisons,  including 
absolute  and  per  cent  increases,  for  several  years.  Just  as 
general  tables  serve  to  show  in  terms  of  absolute  numbers 
the  composition  of  aggregates,  a  derivative  table  frequently 
serves  the  purposes  of  explanation  correspondingly  by  means 
of  per  cent  distribution.  If  text  tables  contain  data  taken 
direct  from  returns,  these  are  so  treated  because  of  lack  of 
comprehensiveness  in  the  data,  or  of  perennial  interest  in  that 
kind  of  data.  Explanatoiy  and  qualifying  statements  con- 
tained in  general-table  footnotes  should,  unless  unimportant, 
be  either  repeated  or  referred  to  in  footnotes,  or  in  text  im- 
mediately adjacent  to  the  text  tables. 

It  is  the  common  practice  of  statistical  bureaus  to  number 
tables  serially  for  each  report.  If  Roman  numerals  are  used 
for  the  general  tables,  arable  numerals  are  used  for  derivative 
tables,  or  vice  versa.  .  .  . 

No  strict  line  can  be,  or  need  be,  drawn  between  what 


254  STATISTICAL   METHODS 

should  go  into  general  and  what  into  text  tables,  though  the 
fact  that  ratios  are  logically  a  part  of  the  analysis  gives  the 
analytical  text,  if  there  is  any  such,  a  strong  claim  upon  them. 
Grand  totals  certainly  go  with  the  general  tables  not  only 
as  closing  them  up  but  also  because  of  their  importance  as  a 
proof  check.  But  divisional  totals  serving  the  purpose  of  a 
summary  may  go  in  either  place.  Ratios,  too,  may  come  to 
have  so  thoroughly  well  established  a  place  as  to  be  in  effect 
a  part  of  the  data  that  the  public  will  expect  to  find  in  con- 
nection with  the  general  tables.  A  derivative  table  in  a  re- 
port containing  the  corresponding  primary  tables  is  seldom 
to  be  considered  a  thing  by  itself  to  the  extent  of  requiring 
no  reference  to  its  sources  on  the  part  of  a  reader  who  uses  it 
carefully. 

Comparisons  with  previous  years  - —  or  with  corresponding 
months  (or  other  portions)  of  previous  years  —  are  also 
strictly  a  part  of  analysis,  but  their  significance  is  so  direct 
and  their  meaning  in  general  so  unmistakable  that  some  of 
them  may  well  be  looked  for  in  the  general  tables.  They 
are  made  much  of  especially  in  commercial  and  financial 
statistics.  The  United  States  Census  is  liberal  in  present- 
ing comparisons  for  previous  decennial  years  in  its  general 
tables. 

General  or  primary  tables  rightly  occupy  the  largest  place 
in  most  government  statistical  publications.  Indeed,  some 
official  statisticians  feel  that  the  preparation  and  presentation 
of  the  primary  tables  is  their  whole  duty.  But  some  work- 
ing-over of  the  raw  material  by  those  directly  concerned  with 
its  compilation  is  desirable,  if  for  no  other  reason  than  the 
beneficial  reaction  on  the  original  data  and  tables  consequent 
upon  analyzing  and  applying  them  to  the  solution  of  scien- 
tific and  practical  problems.  Proper  emphasis  upon  the 
function  of  such  statistical  publications  as  sources  does  not 


CLASSIFICATION  —  TABULAR  PRESENTATION      255 

preclude  brief  suggestive  analysis,  in  addition  to  the  necessary 
descriptive  and  cautionary  remarks. 

The  Rounding  and  Abbreviation  of  Numbers.  —  The  use  of 
rounded  or  cut-off  numbers  should  seldom  be  adopted  in 
general  or  primary  tables,  though  doubtless  desirable  in 
derivative  or  interpretative  tables.  The  practice  is  often 
recommended  without  reference  to,  or  due  emphasis  upon, 
this  very  necessary  qualification. 

Even  in  derivative  tables,  the  giving  of  a  large  number, 
for  example,  millions  of  inhabitants,  to  the  last  digit  would 
mislead  by  its  supposed  suggestion  of  "spurious  accuracy" 
only  in  the  case  of  a  reader  who  would  have  at  least  equal 
difficulty  in  understanding  what  the  rounding  of  the  figures 
meant.  The  notion  that  we  should  print  numbers  showing 
the  digits  only  in  so  far  as  they  are  known  to  be  accurate, 
or  on  the  basis  of  the  theory  of  probabilities  considered  to  be 
so,  is  impractical  to  the  height  of  absurdity.  The  truth  of 
the  stated  population  of  New  York  City  —  4,766,883  in  1910 
—  is  not  of  a  nature  to  imply  that  the  figure  3  in  the  units 
place  has  statistical  significance.  The  statistician  knows  that 
the  last  four  digits  are  neither  more  nor  less  accurate  or  truth- 
ful if  made  to  read  7000  instead  of  6883.  He  does  not  need 
to  be  reminded  that  the  117  has  no  objective  or  exact  mean- 
ing in  such  an  aggregate.  It  is  seldom  necessary  to  indicate 
that  large  numerical  aggregates  are  approximate  as  to  the 
right-hand  figures. 

But  there  is  also  a  positive  objection  to  the  rounding  of 
such  numbers.  From  the  point  of  view  of  statistical  admin- 
istration it  is  important  that,  for  example,  the  population 
of  a  large  area  be  the  total  for  all  its  parts  down  to  the  smallest 
district  for  which  separate  figures  are  given,  some  of  which 
in  the  instance  referred  to  actually  have  less  than  117  in- 
habitants.    Rounding  an  absolute  number  is  never  obliga- 


256  STATISTICAL   METHODS 

tory  and  should  never  be  done  in  a  way  to  deprive  any  one 
of  the  possibiUty  of  completely  checking  the  number  and  of 
using  for  this  purpose,  if  for  no  other,  the  unmodified  orig- 
inal aggregate.  Primary  numerical  data  should  not  be 
rounded. 

As  regards  ratios,  too,  their  mechanical  computation  with 
equal  ease  to  a  larger  as  to  a  smaller  number  of  places  makes 
the  decision  of  how  far  they  should  be  carried  a  question  of 
conventional  expectations  and  of  economy  of  attention 
rather  than  anything  more  fundamental.  This  statement 
does  not  refer  to  (and  does  not  apply  for)  slide-rule  compu- 
tations. The  carrying  out  of  ratios  to  two  decimal  places 
(or  for  per  cent  to  hundredths  of  one  per  cent)  seems  to  be 
the  most  satisfactory  practice  for  most  cases,  so  far  as  frac- 
tions are  desirable,  though  only  the  first  place  will  usually 
be  itself  significant,  the  second  serving  rather  to  qualify  the 
first.  Where  three  decimal  places  are  used,  the  printer,  and 
sometimes  the  reader,  will  easily  mistake  the  point  for  a 
comma. 

But  much  depends  on  how  far  it  is  the  statistician's  aim  to 
make  his  material  popular  —  an  end  that  is,  of  course,  entirely 
worthy  in  itself.  The  desirabilit}'  of  rounded  and  abbrevi- 
ated numbers,  also  of  the  use  of  few  numbers,  in  statistical 
exposition  is  chiefly  of  the  same  nature  as  are  the  claims  of 
stylistic  elegance  or  of  force  (as  a  writer  may  prefer  or  the 
conditions  require)  in  the  use  of  the  English  language.  The 
first  duty  of  one  presenting  statistical  results  is  to  be  adequate 
and  accurate ;  if  possible  it  is  well  for  him  to  be  also  elegant, 
or  forcible,  or  whatever  else  may  be  desirable,  in  his  choice 
of  words  and  of  numerical  expressions. 

The  process  of  rounding  or  cutting  off  numbers  is  by  no 
means  simple  or  a  matter  of  course.  On  the  contrary,  it  re- 
quires considerable  statistical  technique  —  else  totals  will 


CLASSIFICATION  —  TABULAR  PRESENTATION      257 

be  found  not  to  check  with  items  and  ratios  not  with  the  data 
from  which  they  are  derived.  It  may  be  noted  incidentally 
that  where  it  may  seem  desirable,  as  frequently  in  the  case 
of  estimates,  to  round  or  abbreviate  both  a  relative  number 
and  the  corresponding  absolute  number,  one  cannot  do 
both  and  at  the  same  time  preserve  the  requisite  verifiable 
relation  between  the  two.  This  fact  counts  against  the 
rounding  even  of  estimates,  though  some  sign  of  approxima- 
tion is  in  such  cases  especially  desirable. 

Tabular  Notation.  —  The  rounding  and  abbreviation  of 
numbers  is  strictly  a  part  of  the  subject  of  tabular  notation, 
but  so  fundamental  as  to  affect  the  character  of  the  statistical 
table  as  such.  The  word  "notation"  properly  refers  to  the 
relation  between  the  signs  and  symbols  used  to  convey  the 
meaning  of  any  part  of  the  table  and  the  significance  arbi- 
trarily or  conventionally  attaching  to  them.  To  illustrate, 
it  would  seem  that  the  last  two  digits,  83,  of  the  figure  for 
the  population  of  New  York  City  in  1910,  preceded  as  they 
are  by  five  other  digits  having  the  significance  of  position 
proper  to  them  according  to  the  arable  numerical  notation, 
ought,  without  difficulty,  to  be  interpreted  as  having  a  differ- 
ent statistical  significance  from  the  figure  83  as  arrived  at, 
for  example,  by  a  careful  housewife  on  inventorying  her  pieces 
of  silverware  preparatory  to  putting  them  into  safe  deposit, 
or  by  a  dairyman  counting  his  stock. 

The  signs  used  in  tabulation  are  chiefly  arable  numerals 
and  the  letters  of  the  alphabet  in  their  various  appropriate 
combinations.  The  position  of  such  a  sign  may  be  a  part 
of  the  notation.  The  notation  of  a  table  is  the  language  in 
which  its  import  is  expressed ;  and  that  language  should  be 
as  direct,  concise,  and  unambiguous  as  it  is  possible  to 
make  it. 

The  technique  of  statistical  notation  has  not  reached  a 


258  STATISTICAL   METHODS 

high  stage  of  development.  The  writer,  at  any  rate,  feels 
that  the  tendency  among  statisticians  to  treat  a  table  as  a 
mere  repository  of  numbers  and  to  indicate  in  footnotes  any 
state  of  facts  not  so  represented  is  objectionable.  The  ab- 
sence of  a  report,  the  failure  to  segregate  returns,  the  character 
of  an  entry  as  estimated  or  as  incomplete  —  all  these  are  mat- 
ters that  can  be  shown  by  appropriate  signs  on  the  face  of  the 
table.  The  best  policy  would  seem  to  be  to  make  the  tabular 
entries  self-explanatory  to  as  high  a  degree  as  possible,  for 
the  purposes  of  the  particular  tabulation,  by  the  use  of  word 
or  other  non-numerical  sign  entries  where  feasible.  Foot- 
notes are  thus  reserved  to  supplement  or  qualify  both  numer- 
ical and  sign  entries  and  especially  are  not  intended  to  take 
the  place  of  lacking  numbers.  But  the  technique  of  tabular 
notation  lies  outside  the  scope  of  a  discussion  of  the  general 
aspects  of  statistical  tabulation. 

REVIEW 

1.  Why  may  a  statistical  table  be  spoken  of  as  a  "surface"? 
From  what  angles  may  such  a  surface  be  viewed? 

2.  Contrast  caption-  and  stub-headings.  May  they  always  be 
interchanged?  Why?  Work  out  a  "treble"  table,  and  inter- 
change the  headings.  What  is  the  result?  What  conditions  con- 
trol the  order  of  items  in  both? 

3.  Formulate  a  general  statement  showing  the  "  Limitations 
upon  Tabular  Presentation."     How  are  these  overcome? 

4.  Why  may  "  comprehensiveness,  comparability,  and  compact- 
ness" be  held  to  be  essentials  of  statistical  tables? 

5.  Contrast  general  and  derivative  tables. 

6.  How  is  the  practice  of  rounding  and  abbreviating  numbers 
in  tabulation  related  to  accuracy,  to  "  spurious  accuracy,"  to  com- 
pensation of  errors,  to  the  serviceability  of  tables? 


CLASSIFICATION  —  TABULAR  PRESENTATION      259 

Standardization  of  the  Construction  of 
Statistical  Tables^ 

The  progress  of  every  art  should  be  marked  by  the  ac- 
cumulation of  an  increasing  stock  of  generally  accepted 
practices.  As  these  practices  obtain  common  approval, 
they  should  be  recognized  as  standard  and  regularly  fol- 
lowed until  more  satisfactory  methods  are  discovered. 
A  measure  of  standardization  is  thus  a  normal  feature  of 
development. 

Standardization  of  statistical  practices  should  not  be 
invited,  however,  without  recognition  of  its  dangers.  Like 
"law  and  order"  in  civil  life,  standardization  may  easily 
be  overdone.  There  is  always  the  risk  of  formalism.  But 
kept  within  proper  limits,  standardization  has  a  steadying 
influence  which  tends  to  accelerate,  not  retard,  the  im- 
provement of  statistical  exposition.  It  effects  good  order, 
and  is  an  unmistakable  mark  of  real  progress. 

It  is  consequently  profitable  to  consider  from  time  to 
time  the  extent  to  which  standardization  can  advanta- 
geously be  accepted.  In  statistical  exposition,  the  stand- 
ardization of  graphic  methods  has  l)een  one  of  the  gratify- 
ing advances  of  recent  years.  To  what  extent  has  there 
been  and  to  what  extent  are  there  further  opportunities 
for  a  similar  standardization  of  practice  in  the  methods  of 
tabular  presentation  ? 

In  considering  this  question,  it  should  not  be  thought 
that  standardization  is  accomplished  only  through  the 
conscious    adoption   of    rules    and    regulations   set   up   by 

1  Taken  with  permission  from  Day,  Edmund  E.,  "Standardization  of 
the  Construction  of  Statistical  Tables,"  a  paper  read  at  the  Eighty-first 
Annual  Meeting  of  the  American  Statistical  Association,  Chicago,  Decem- 
ber, 1919,  and  later  published  in  revised  form  in  the  Quarterly  Publications 
of  the  American  Statistical  Association,  March,  1920,  pp.  59-66. 


260  STATISTICAL   METHODS 

recognized  organs  of  authority.  Standardized  statistical 
practices  may  evolve  by  imperceptible  degrees  through  the 
influences  of  imitation  and  prestige.  This  is  particularly 
the  case  if  some  one  statistical  bureau  is  the  fountain-head 
of  governmental  practice.  The  working  rules  of  such  an 
office  tend  to  become  the  rules  of  a  following  of  less  in- 
fluential practitioners.  Standardization  of  this  kind  is 
going  on  at  all  times.  Such  standardization  of  practice 
as  we  have  to-day  in  statistical  work  in  this  country  is 
almost  altogether  the  result  of  the  influences  of  imitation 
and  prestige. 

Unconscious  standardization  of  this  sort  has  already 
made  substantial  progress  with  regard  to  the  structure 
of  statistical  tables.  Without  attempting  a  complete  enu- 
meration of  the  rules  observed  by  competent  authorities, 
a  few  of  the  standard  practices  may  be  noted  in  passing. 
Thus  it  is  generally  recognized  :  (1)  that  every  table  should 
be  self-sufficing,  containing  within  itself  a  clear  explana- 
tion of  the  meaning  of  the  items  displayed ;  (2)  that  every 
table  should  be  logically  a  unit,  containing  only  data  which 
are  intimately  related  with  one  another ;  (3)  that  column- 
and  row-headings  should  be  brief,  unambiguous,  and  self- 
explanatory,  table  footnotes  being  used  when  necessary 
to  make  the  headings  perfectly  clear;  (4)  that  coordinate 
and  subordinate  relationships  among  the  column-  and  row- 
headings  should  be  shown  by  variations  of  boxing  in  the 
captions  and  of  indentation  in  the  stub ;  (5)  that  varieties 
of  letters,  figures,  lines,  column-widths,  and  interlinear 
spacings  should  be  employed  to  facilitate  easy  and  intelli- 
gent use  of  the  table ;  (6)  that  columns  and  rows  should  be 
lettered  or  numbered  if  cross  reference  is  desirable ;  and 
(7)  that  sources  and  units  should  invariably  be  indicated. 
The  common  acceptance  of  these  principles  represents  no 


CLASSIFICATION  — TABULAR  PRESENTATION      261 

mean  advance  in  the  standardization  of  statistical  table 
structure. 

It  is  to  be  observed,  however,  that  the  standardization 
thus  far  effected  concerns  primarily  the  constituent  parts 
of  the  table,  not  the  table's  general  form.  The  choice  of 
position  between  columns  and  rows,  the  arrangement  of  the 
several  columns  or  the  several  rows,  and  the  location  of 
particular  columns  toward  the  left  of  the  table  or  of  particular 
rows  toward  the  top,  seem  still  to  be  matters  of  individual 
preference,  if  not  of  chance.  It  is  important  to  consider  how 
far  standardization  of  the  general  form  of  statistical  tables 
is  feasible  and  desirable. 

Standardization  of  the  general  form  of  statistical  tables 
must  begin  with  a  distinction  between  general-purpose 
and  special-purpose  tables.  The  general-purpose  table  is 
designed  to  bring  together  in  most  convenient  and  accessible 
form  all  the  data  bearing  upon  a  given  topic.  The  special- 
purpose  table  is  intended  to  throw  into  relief  relationships 
of  special  significance  in  a  given  study.  The  general-pur- 
pose table  is  an  orderly  presentation  of  statistical  ma- 
terial ;  the  special-purpose  table,  a  record  of  the  results  of 
statistical  analysis.  Of  course,  a  measure  of  analysis  is  a 
prerequisite  even  of  the  general-purpose  table,  but  the 
analysis  is  of  a  different  order.  It  is  the  analysis  essential 
to  effective  enumeration  and  tabulation,  not  the  analysis 
accompanying  specific  interpretation.  The  analysis  re- 
quired for  the  special-purpose  table  is  directed  toward  a 
particular  issue.  The  problems  of  good  table  structure 
are  essentially  different  for  the  two  types  of  tables. 

Since  the  construction  of  the  general-purpose  table  is  the 
simpler  case,  it  first  will  l)e  examined.  In  considerable 
measure,  the  general-purpose,  or  primary,  table  is  a  creature 
of  the  physical  form  of  the  medium  in  which  it  appears. 


262  STATISTICAL   METHODS 

Upon  the  one  hand,  the  table  tends  to  expand  to  accommo- 
date the  large  body  of  data  pressing  for  inclusion.  Upon 
the  other  hand,  the  capacity  of  the  printed  page  —  even 
if  it  be  folio  —  stands  as  a  limit  on  the  indefinite  enlarge- 
ment of  the  table.  Tables  which  are  allowed  to  exceed  the 
dimensions  of  the  page  and  have  to  be  folded  in  are  every- 
where recognized  as  objectionable.  Loose  tables,  sepa- 
rately printed  in  large  irregular  sizes,  are  as  bad,  if  not 
worse.  Tables  running  across  two  pages  facing  one  an- 
other are  reasonably  satisfactory  but  are  to  be  avoided 
where  possible.  Tables  which  are  presented  at  right 
angles  to  the  text  fall  into  the  same  class.  In  general, 
the  single  page,  held  as  when  reading  the  text,  is  the 
maximum  size  to  which  the  statistical  table  should  be  per- 
mitted to  run.  Primary  tables  usually  press  upon  this  phys- 
ical limit ;  their  outside  dimensions  are  thus  independently 
determined. 

Within  the  table,  similar  influences  are  at  work.  Whether 
given  arrays  of  data  shall  be  exhibited  in  columns  or  in 
rows  is  commonly  a  question  of  the  difference  in  the  vertical 
and  horizontal  capacity  of  the  page.  The  maximum  number 
of  lines  in  a  table  is  several  times  greater  than  the  maximum 
number  of  cohunns.  Consequently  the  arrays  having  the 
greatest  number  of  items  are  naturally  assigned  to  the 
columns,  the  other  arrays  to  the  rows.  Once  a  given  set  of 
headings  has  appeared  in  caption-  or  stub-position,  there 
is  a  strong  presumption  in  favor  of  its  occupying  the  same 
position  in  other  related  tables,  for  the  transcription  of  data 
from  general  tables  is  thereby  facilitated.  Upon  the  whole, 
however,  the  assignment  of  columns  and  rows  rests  funda- 
mentally upon  the  greater  capacity  of  the  column :  a  factor 
not  subject  to  modification  by  the  statistician. 

A  much  larger  measure  of  option  may  be  exercised  in  fix- 


CLASSIFICATION  —  TABULAR  PRESENTATION      263 

ing,  in  a  general-purpose  table,  the  order  of  columns  and  of 
rows.  Almost  an}^  systematic  plan  may  be  adopted ;  but 
the  most  satisfactory  arrangements  are  the  alphabetical, 
chronological,  geographical,  or  according  to  the  magnitude 
of  the  items.  There  are  no  grounds  for  urging  the  adoption 
of  any  one  or  two  of  these  arrangements  to  the  exclusion 
of  the  others.  Now  one  best  serves ;  now  another.  One 
rule,  however,  should  govern  the  final  selection  in  all  cases : 
that  order  should  be  emploj^ed  which  keeps  the  details 
of  the  table  most  generally  accessible.  Readers  will  come 
to  the  table  with  a  variety  of  interests.  They  should  be  given 
that  table  from  which  in  general  they  can  most  easily  draw 
the  information  they  seek.  Arrangement  according  to 
magnitude  or  importance  of  items  is  less  satisfactory  in 
general-purpose,  than  in  special-purpose,  tables,  because  it 
depends  upon  analysis  from  a  single  point  of  view  and  it  is 
frequently  unwise  to  commit  the  table  to  this  particular 
viewpoint.  The  other  arrangements  better  meet  the  variety 
of  needs  which  a  primary  table  is  designed  to  serve.  The 
important  end  is  to  secure  some  logically  and  commonly 
understood  arrangement  which  opens  the  table  to  easy 
transcription. 

When  geographical  or  chronological  orders  are  adopted, 
a  decision  has  to  be  reached  as  to  what  items  to  place  at 
the  top  and  left  and  what  items  at  the  bottom  and  right. 
In  the  tabular  arrangement  of  the  states  of  this  country 
the  grouping  and  order  followed  by  the  Bureau  of  the  Cen- 
sus may  be  recognized  as  standard ;  the  northern  New  Eng- 
land states  stand  at  the  head  of  the  list,  the  southern  Pacific 
states  at  the  foot.  In  general,  the  best  statistical  prac- 
tice for  this  country  would  seem  to  run  geographical  series 
from  north  to  south  and  from  east  to  west.  With  chrono- 
logical series  the  case  is  not  so  clear.     Upon  the  whole, 


264  STATISTICAL   METHODS 

however,  for  general-purpose  tables,  the  Census  Bureau  prac- 
tice of  placing  most  recent  dates  at  the  top  and  left  seems 
commendable  if  there  is  a  fair  presumption  that  the  figures 
of  most  recent  date  will  be  most  frequently  transcribed. 
When,  however,  the  data  will  probably  be  transcribed  in 
entirety  as  time  series  it  would  seem  preferable  to  place 
the  figures  for  earlier  dates  toward  the  top  and  left.  The  rule 
to  apply  in  all  these  cases  is  simple:  the  most  generally 
useful  data  should  be  located  toward  the  top  and  left  where 
accurate  transcription  is  rendered  easier  by  close  proximity 
to  the  column-  and  row-headings. 

The  general  or  primary  table  exhibits  no  specific  analysis. 
Its  form  is  in  considerable  measure  the  resultant  of  the  phys- 
ical limitations  of  the  page  and  the  necessity  of  present- 
ing a  maximum  body  of  data  in  a  way  to  make  the  most 
generally  useful  parts  most  readily  accessible.  The  derived 
or  analytical  table  is  a  different  statistical  device.  A  de- 
rived table  is  essentially  deficient  if  it  fails  to  exhibit  a  care- 
fully formulated  analysis.  It  should  be  constructed  to 
assist  a  specific  interpretation ;  every  effort  should  be  made  to 
make  the  table  simple ;  it  should  contain  only  those  items 
valuable  to  the  analysis,  arranged  so  as  to  encourage  the  de- 
ductions the  reader  is  expected  to  draw.  If  any  line  is  to 
be  drawn  between  statistical  tabulation  and  statistical 
analysis,  the  primary  table  displays  the  results  of  tabula- 
tion, the  derived  table  the  results  of  analysis. 

Despite  this  fundamental  distinction  between  primary 
and  derived  tables,  it  is  to  be  admitted  in  the  first  place  that 
the  derived  table  is  not  altogether  free  from  the  influences 
of  format  which  plays  so  important  a  part  in  shaping  the 
primary  table.  For  example,  if  the  number  of  subdivisions 
in  one  classification  of  an  analysis  is  much  greater  than  in 
the  other,  it  may  be  necessary  to  put  the  more  extended 


CLASSIFICATION  —  TABULAR  PRESENTATION      265 

classification  in  the  stub  simply  because  stub-capacity  is 
normally  so  much  greater  than  caption-capacity.  Simi- 
larly, if  the  designations  in  one  classification  are  much  longer 
than  in  the  other,  it  may  be  necessary  to  place  the  classi- 
fication with  longer  headings  in  the  stub,  since  neither  of 
the  alternatives  —  printing  the  longer  headings  vertically 
at  the  top  of  the  columns,  or  widening  the  columns  to  ac- 
commodate the  longer  headings  horizontally  —  is  at  all 
satisfactory.  Such  crass  considerations  as  these  are  at 
times  decisive  in  determining  the  structure  even  of  the  de- 
rived table.  But  they  play  a  much  less  important  part 
with  the  derived  table  than  with  the  primary  table.  As  a 
rule  the  statistician  is  able  to  make  the  general  form  of  the 
derived  table  serve  the  exposition  in  hand. 

One  of  the  most  fundamental  questions  of  structure 
is  the  assignment  of  data  to  columns  in  some  instances, 
to  rows  in  others.  This  matter  should  be  settled  in  the 
derived  table  with  reference  to  what  comparisons  it  is  most 
important  to  present.  Comparison  of  like  items  in  a  column 
is  much  easier  than  of  like  items  in  a  row.  It  is  believed 
that  recognition  of  this  fact  will  commonly  throw  chrono- 
logical, geographical,  and  quantitative  classifications  into 
the  stub,  qualitative  classifications  into  the  caption ;  but 
this  is  not  a  necessary  outcome.  The  important  principle 
is  to  use  the  column  position  to  promote  the  more  significant 
comparison. 

Arrangement  of  the  several  columns  and  of  the  several 
rows  in  the  derived  table  will  be  determined  by  the  par- 
ticular character  of  the  analysis  in  connection  with  which 
the  table  is  employed.  If  the  analysis  is  of  a  temporary 
distribution,  a  chronological  order  will  be  adopted ;  if  of  a 
spatial  distribution,  a  geographical  order.  If  the  items 
are   component   parts   of   an   aggregate,    arrangement   will 


266  STATISTICAL   METHODS 

be  either  according  to  the  relative  magnitude  or  importance 
of  the  item,  or  according  to  some  other  order  generally 
recognized  in  the  analysis  of  the  data  in  question.  Pre- 
sumably the  alphabetical  arrangement  will  seldom  be  fol- 
lowed, since  it  does  not  directly  disclose  significant  relation- 
ships. Ordinarily  the  purpose  of  the  analysis  will  indicate 
clearly  enough  the  order  in  which  the  columns  or  the  rows 
should  be  placed. 

Naturally  the  arrangement  of  columns  and  rows  should 
give  proper  regard  to  the  fact  that  the  most  conspicuous 
position  in  a  statistical  table  is  at  the  top  and  left.  ^V^lile 
it  is  generally  true  that  derived  tables  are  designed  to  bring 
out  relationships  rather  than  individual  items  and  that  these 
relationships  are  properties  of  the  table  as  a  whole  rather 
than  of  particular  parts,  it  may  be  desirable  in  some  tables 
to  focus  attention  especially  upon  certain  more  important 
items.  When  other  considerations  will  permit,  these  more 
important  items  should  be  placed  in  the  most  exposed  posi- 
tions of  the  table :  namely,  at  the  top  and  left  next  to  the 
captions  and  stub.  This  rule  is  a  sufficient  warrant  for 
placing  totals  at  the  top  and  left  when  they  are  clearly  the 
most  significant  items  of  the  tabulation,  and  when  placing 
them  at  the  top  and  left  will  not  give  serious  offense  to  the 
users  of  the  table.  If  either  of  these  conditions  is  not  pres- 
ent it  would  seem  preferable  to  place  totals  in  the  posi- 
tions in  which  most  readers  expect  to  find  them,  namely, 
at  the  bottom  and  right.  There  appears  to  be  no  adequate 
reason  for  departing  from  the  established  practice  of  read- 
ing time  from  top  to  bottom  and  left  to  right.  In  derived 
tables,  figures  for  later  dates  should  appear  toward  the 
bottom  and  right.  It  is  the  relation  between  items,  not 
the  individual  item,  which  is  significant  in  time  series.  For 
many  reasons  we  are  accustomed  to  thinking  of  the  upper 


CLASSIFICATION  —  TABULAR  PRESENTATION      267 

or  left-hand  of  two  figures  as  being  the  earher,  and  we  draw 
our  conclusions  accordingly.  Furthermore,  this  rule  is 
already  thoroughly  incorporated  in  our  graphic  practices. 
To  have  diametrically  different  rules  for  graphic  and  tabu- 
lar presentation  would  be  unfortunate.  The  Census  Bureau 
practice  of  placing  data  for  most  recent  dates  at  the  top 
and  left  is  therefore  not  to  be  approved  for  the  derived  table. 
Effective  exposition  of  the  statistical  evidences  is  better 
served  by  the  order  which  seems  most  natural  to  the  gi'eat 
majority  of  readers.  Arrangements  of  columns  and  rows 
should  hold  fast  to  the  purpose  of  facihtating  interpretation. 

If  the  dominant  purpose  of  the  derived  table  be  kept  in 
n:ind,  many  problems  of  tabular  arrangement  will  be  readily 
solved.  Percentage  distributions  will  l)e  placed  next  to 
the  corresponding  alisolute  figures  or  in  a  separate  portion 
of  the  table  according  to  the  emphasis  of  the  analysis.  To 
facihtate  comparisons  of  relationship,  the  arrangements 
adopted  in  one  table  of  an  analysis  will  be  followed  as  closely 
in  the  other  tables  as  other  more  important  considerations 
will  permit.  Columns  and  rows  which  are  to  be  compared 
with  one  another  ^vill  be  brought  as  closely  together  as 
possible.  Unnecessary  digits  will  be  dropped  and  items 
given  in  round  numbers  to  simplify  the  presentation.  The 
aim  throughout  will  be  to  make  the  derived  table  an  effective 
instrument  of  statistical  exposition. 

If  such  are  the  considerations  involved  in  the  construc- 
tion of  statistical  tables,  what  conclusions  are  to  be  drawn 
regarding  the  possibilities  of  standardization  of  table  struc- 
ture? Upon  the  whole,  the  opportunities  for  complete 
standardization  seem  slight  except  with  regard  to  the  ele- 
ments from  which  the  table  is  to  be  constructed,  and  cer- 
tain lesser  matters  of  general  arrangement.  More  is  to  be 
gained  at  this  time  from  a  clear  recognition  of  important 


268  STATISTICAL   METHODS 

guiding  principles  in  table  construction.  Careful  atten- 
tion must  be  paid  to  the  difference  of  purpose  in  primary 
and  derived  tables.  The  primary  table  must  be  made 
to  offer  its  items  for  easy  transcription;  the  derived  table, 
for  ready  deduction.  If  statistical  tables  are  formed  with 
nice  regard  for  those  fundamental  aims  of  tabular  pres- 
entation, standardization  may  well  be  allowed  to  proceed 
as  it  has  heretofore  through  imitation  of  the  most  satis- 
factory existing  practices.  Untiring  experiment  with  vary- 
ing forms  and  ready  acceptance  of  improvements  are  for 
the  present  the  most  promising  means  of  securing  better 
construction  of  statistical  tables. 

REVIEW 

1.  In  the  discussion  of  the  size  of  general-purpose  tables,  what 
use  of  the  tables  has  the  author  in  mind?  Would  you  support  his 
contention  respecting  such  tables  when  they  are  prepared  for  office 
use  only?     What  criteria  on  size  would  you  set  up  for  this  use? 

2.  .Can  stub-  and  caption-headings  be  interchanged  with  equally 
good  results,  assuming  that  the  page  mil  comfortably  admit  of 
either  arrangement?  Make  such  a  change,  using  the  outlines  of 
single,  double,  and  triple  tables.     What  is  the  effect  in  each  case? 

3.  Do  you  agree  with  the  author's  statement  that  "almost  any 
systematic  plan  may  be  adopted"  .  .  .  "in  fixing  in  a  general- 
purpose  table,  the  order  of  columns  and  rows?"  Compare  this 
generalization  with  the  contentions  in  the  Text. 

4.  Can  a  line  be  drawn  between  "statistical  tabulation  and  statis- 
tical analysis  "  ?     What  answer  would  the  Text  give  to  this  question  ? 

Statistical  Standards  in  Tabulating  Facts  ^ 

Tabulation  is  a  means,  first,  of  recording  in  fixed  form  a 
classification   previously   developed,    or   second,    of   placing 

^  Adapted  from  Secrist,  Horace,  "  Statistical  Standards  in  Business  Re- 
search," Quarterly  Publications,  American  Statistical  Association,  March, 
1920,  pp.  53-54. 


CLASSIFICATION  —  TABULAR  PRESENTATION      269 

similar  facts  into  juxtaposition  or  into  groups  as  a  prelim- 
inary to  a  final  classification.  It  is  a  device  for  projecting 
on  a  surface,  capable  of  being  read  in  two  dimensions,  a 
classification  which  has  been  worked,  or  is  being  worked, 
out.  It  is  a  method  of  recording  a  process  of  thought.  It 
is  inelastic  in  structure ;  the  facts  which  it  contains  arc  in 
truth  "locked  up."  Classification  precedes,  tabulation  fol- 
lows. The  sequence  of  thought  is  from  purpose  to  method. 
The  statistical  standards  to  which  tabulation  must  con- 
form are  as  follows.  It  is  necessary  to  say  that  it  is  not 
my  intention  so  much  to  formulate  a  set  of  rules  governing 
the  make-up  of  tabulation  forms  as  it  is  to  develop  sta- 
tistical standards  in  tabulation  of  permanent  value,  the 
realization  of  which  may  require  a  variable  technique. 

First.  —  Every  tabulation  surface  should  faithfully  record 
the  classification  which  it  is  intended  to  depict.  The  pur- 
pose of  tabulation  and  the  standard  to  which  it  must  con- 
form cannot  be  divorced. 

Second.  —  There  is  always  a  best  form  of  tabulation  for  a 
given  purpose,  as  there  is  a  most  logical  basis  of  classifica- 
tion. Indiscriminate  choice  of  forms  is  as  much  without 
justification  as  is  a  meaningless  or  superficial  classification. 

Third.  —  Every  tabulation  should  be  adjusted  in  form 
and  complexity  (a)  to  the  subject  matter  which  is  to  be 
expressed,  and  (6)  to  the  person  for  whom  it  is  prepared  or 
the  end  to  which  it  is  addressed. 

Fourth.  —  The  order  of  detail  in  tabulation  forms  should 
be  adjusted  so  as  to  be  emphatic.  It  should  be  natural, 
not  artificial ;    convincing,  not  purposeless. 

Fifth.  —  Statistical  tables  should  carry  only  relevant 
data.     The  reciprocal  relation  between  relevancy   of  fact 


270 


STATISTICAL   METHODS 


and  the  purpose  to  be  accomplished  by  tabulation  is  the 
thought  which  is  stressed. 

Sixth.  —  Statistical  tables  should  carry  on  their  face 
both  their  justification  and  their  explanation. 

Seventh.  —  The  details  of  statistical  tables  should  be  me- 
chanically accurate  and  their  grouping  and  arrangement 
consistent,  logical,  and  serviceable. 

Eighth.  —  The  natural  order  in  classification  is  from 
detail  to  summary;  the  serviceable  order  in  tabulation  is 
from  summary  to  detail. 

Ninth.  —  Brevity  is  said  to  be  "the  soul  of  wit."  It 
is  equally  true  that  conciseness  in  tabulation  is  the  secret 
of  its  effectiveness  for  most  practical  purposes. 

A  CENSUS  CARD 


1          8 

3 

Hd 

O 

BO 

rt      ;aj« 

ru 

Mo  ;p» 

ITB 

:i.c 

US    !lc 

K* 

R« 

O 

;io   K) 

o      o 

xsr 

• 

•  i*'* 

Ooe 

FtC   ;i.o 

• 

•i 

• 

,• 

• 

• 

• 

J 

a       «    ' 

7 

Wf 

t 

0S 

F        lArl 

Znd 

Wbr  ;i!.I. 

On 

lOI. 

Va     ;OU 

FP 

Al 

ill    1 

1 

1 

w 

Au 

Oer 

Rou  :Arm 

1 

i — 

• 

|— "j"* 

3 

SD 

s 

eo 

3 

Ark 

ROQ 

N  c  Ja  c 

AQ 

Hoi* 

Au.      Ho] 

10 

P» 

jifl 

• 

a 

OA 

&         B 

o"l* 

• 

Ot 

to 

es 

• 

U 

Bel 

cir 

Kol 
K7 

Bos  jCFr 

n.D  ;a.D. 

Cu 

Bun 

C&a    Bun 

09 

Uo 

!..  U 

s 

Us 

• 

15 

70 

Wd 

Co) 

Hun 

8.  A  :Cro 
Hflrv  -Tfin 

Den 

Zra 

DflD   Xt* 

-oa 

i 

. 

4 

1       a 

3 

M 

E~ 

Ye* 

• 

C..A,  lit 

Bg      |l>«l 

• 

i 

• 

5«  • 

T 

F 

SO 

75 

D 

ct 

U&8 

NH.:Tf« 

GoC 

Xt 

Ens  It 

■07 

LO 

•10 

s 

■ 

No 

•r.-- 

8 

• 

ai 

1 

eo 

Uo 

Cuj     It 

D.  C.  Kch 

eor    JUt 

N.J  iuu 

Pr 

Jfoi- 

Fr       Wor 

oa 

01 

iio  *« 

• 

0 

W 

: 

Qhl 

JftP 

Sp      ;Mor 

: 

t         0 

'• 

B 

99 

S3 

o~" 

Del 

Kd 

NM.IVa 

Sc* 

■KOM 

Oc*    Bu> 

05 

-eoA 

|l7 

7 

7 

1 

Ma 

30 

oo 

.• 

Cub 

Lux 

Bwc  ;poi 
NT  ivi 

Ger 

Bo 

Oct    ec 

OI 

prior 
Va 

i 

gift 

• 

s 

» 

8 
7 

I,^ 

0 

• 

0 

• 

Cb 

3S 

«^ 

a 

9^ 

Win 

Bid    :Rut 

04 

;i» 

• 

B 

-• 

14 

% 

S 

ohi  :wi»  wroIsw« 

Bid 

Qw« 

CA 

Jp 

« 

Va 

3 

Kta 

T7A  JBlk      Yld 

OW    IWih  Ab 

1 

3id 

oo 

01 

CN 

NO 

i 

|oi 

X 

aa 

27 

I 

1 

8 

3 

Oc. 

Wft 

Y»» 

Fm 

Nfd 

T7£   31T      OL 

89 

N 

1 

»a 

«          B 

e 

(0 

la 

u 

n-' 

4 

<d. 

Hot 

I 

B«k 

Ot 

UA 

No 

y 

W« 

Ul 

uS" 

X 

YY 

NN 

TH 

i 

Fr 

■or 

W»   iSrr 

e 

• 

• 

/ 

This  illustration  shows  one  of  the  92,000,000  cards  used  in  tabu- 
lating the  population  returns  at  the  census  of  1910.  The  holes  in 
the  four  numbered  spaces  at  the  left  are  arbitrary  symbols  indicating 
the  state  and  district  in  which  the  person  to  whom  the  card  relates 
was  enumerated;  those  in  the  other  "fields"  describe  his  charac- 
laristies.  Thus,  the  person  to  whom  this  card  refers  resided  in 
enumeration  district  No.  924  (Maynard,  Middlesex  County),  state 


CLASSIFICATION  — TABULAR  PRESENTATION      271 

of  Massachusetts ;  was  a  son  of  the  head  of  the  family  in  which  he 
lived  ;  mulatto  ;  20  years  of  age  ;  native  l)orn  ;  single  ;  born  in 
Georgia ;  father  born  in  United  States ;  mother  born  in  United 
States;  spoke  English;  was  an  agricultural  laborer;  was  out  of 
employment  on  April  15,  1910 ;  was  out  of  employment  between 
7  and  13  weeks  in  1909  ;  could  read  and  write  ;  did  not  attend  school ; 
and  was  not  a  veteran  of  the  Civil  War. 

REVIEW  PROBLEMS 
Tabulation 

1.  Secure  some  blank  Hollerith  Tabidating  Machine  Cards. 
Using  the  detail  provided  on  the  schedule  form  p.  239,  showing 
descriptive  detail  of  your  house,  draft  a  Hollerith  card  form  which 
could  be  used  in  tabulating  the  data. 

2.  Draw  up  three  box  tabulation  forms  for  the  detail  of  this 
schedule  so  as  to  show  the  relation  of  the  size  of  the  houses  (1)  to 
the  number  of  rooms,  (2)  to  the  type  of  heating  equipment,  (3)  to 
the  lighting  equipment.  Give  each  table  a  suitable  title,  and 
prepare  the  forms  in  conformity  with  the  discussion  in  the  Text 
and  Readings.  If  these  conflict,  choose  the  form  which  best  suits 
your  purpose  and  justify  your  method. 

3.  The  following  data  in  relation  to  registration  at  Northwestern 
during  the  second  and  third  terms,  1918-1919,  are  to  be  tabulated 
so  as  to  compare  (1)  men  and  women,  (2)  time  of  withdrawals, 
(3)  source  of  registrants  in  the  third  term.  Follow  the  suggestions 
in  Chapter  V  of  the  Text  relative  to  the  make-up  of  tables.  Give 
the  table  a  suitable  title. 

Registrants  2d  term  1918-1919,  men,  536,  women,  844.  With- 
drawals during  the  2d  term,  1918-1919,  men,  during  the  term  31, 
at  the  end  of  the  term,  37 ;  women  during  the  term,  37,  at  end  of 
term  68.  Registrants  3d  term :  men,  from  2d  term  458,  former 
students  51,  new  students  40;  women,  from  2d  term  739,  former 
students  19,  new  students  34. 

4.  The  following  types  of  data  relative  to  employees  at  each  of 
two  establishments,  "A"  and  "B,"  are  available. 

(1)  Length  of  service  —  expressed  in  weeks  (length  of  service 
groups). 

(2)  Type  of  occupations  —  laborers  and  operatives. 


272 


STATISTICAL   METHODS 


(3)  Number  on  pay  roll  at  end  of  year. 

(4)  Number  separated  during  the  year. 

a.  Draw  up  a  table  form  so  that  those  on  the  pay  roU  at  the  end  of 
the  year  may  be  compared  directly  -with  those  who  separated  during 
the  year  for  each  type  of  occupation  for  each  of  the  establishments. 
(Use  the  length  of  service  groups  as  the  stub.) 

b.  Draw  up  a  table  form  so  that  the  laborers  and  operatives  on 
the  pay  rolls  at  each  of  the  establishments  may  be  compared  with 
those  who  separated  during  the  year.  (Use  the  length  of  service 
groups  as  the  stub.) 

c.  Draw  up  a  table  form  so  that  the  two  establishments  may  be 
directly  compared  for  each  type  of  occupation,  for  those  on  the  pay 
roll  at  the  end  of  the  year  and  for  those  who  separated  during  the 
year.     (Use  the  length  of  service  groups  as  the  stub.) 

5.  Using  the  following  tabulation  of  Failures  in  the  United 
States,  write  a  descriptive  comparison,  two  hundred  words,  of  the 
conditions  in  1919  compared  with  1918. 

In  what  ways,  if  at  all,  is  the  discussion  of  the  advantages  of 
tabulation,  Text  pp.  119-125,  borne  out? 

Summary  —  United  States  * 


Failures  due  to 

Number 

Assets 

Ll^bilities 

1918 

1919 

9331 
3409 

1918 

1919 

1918 

1919 

Total 

5515 

$55,361,296 

$70,322,293 

$115,549,659 

$137,907,644 

Incompetence 

2109 

$11,730,114 

$20,967,819 

$26,068,530 

$37,139,453 

Inexperience  . 

307 

629 

1,740,312 

2,919,880 

5.510,902 

6,508,802 

Lack  of  capital 

1669 

3093 

15,837,726 

20,516.528 

29.378,542 

42.543,457 

Unwise  credits 

72 

123 

2,869,310 

971,4.39 

4,534,615 

2,436,522 

Failures  of 

others    .     . 

97 

86 

2,046,947 

2,785,374 

3.844,066 

4,558,718 

Extravagance 

59 

56 

612,889 

389,004 

1,374,864 

827.083 

Neglect      .     . 

93 

139 

340,426 

456,907 

9.34,622 

1,178.563 

Competition  . 

59 

116 

476,852 

592,176 

945,009 

1,045,733 

Specific  condi- 

tions      .     . 

623 

1107 

12,095,267 

13,779,286 

23,671,566 

27,312,198 

Speculation    . 

37 

33 

1,112,845 

884,453 

2.640,534 

1,668,649 

Fraud   .     .     . 

390 

540 

6,498,608 

6,059,427 

16,646,409 

12.688.466 

1  Bradstreet's,  January  31,  1920,  p.  81. 


CHAPTER  VI 

DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION 

Rules  for  Diagrammatic  Presentation  of 
Statistical  Data  ^ 

A,   General  Make-up  of  Diagrams 

1.  Data  to  accompany  diagrams: 

The  data  shown  graphically  in  a  diagram  should 
be  given  in  tabular  form  either  beside  or  within 
the  diagram,  or  in  close  proximity  in  the  text. 
Care  should  be  exercised,  however,  to  place  fig- 
ures so  as  not  to  disturb  or  distort  the  visual  im- 
pressions conveyed  by  the  chart. 

2.  Scale  units: 

In  general,  in  the  laying  off  of  scales,  the  scale  in- 
tervals on  any  single  diagram  should  be  exactly 
proportionate  to  the  gradations  of  number,  size, 
or  time  represented  (the  logarithmic  scale  con- 
stitutes an  exception  to  this  rule).  .  .  . 

3.  Scale  figures: 

Figures  for  the  scales  of  a  diagram  should  be  placed 
at  the  left  and  at  the  bottom  or  along  the  respec- 
tive axes.  .  .  . 

^  Taken  from  Day,  Reed,  and  Secrist,  "  Rules  for  Graphic  Presentation 
of  Statistical  Data,"  in  Weekly  Statistical  News,  Central  Bureau  of  Planning 
and  Statistics,  No.  5,  Oct.  10,  1918,  Washington,  D.  C.  ^ 

T  273 


274  STATISTICAL   METHODS 

4.  Base  lines : 

It  is  well  to  distinguish  —  as  by  heavier  inking  — 
lines  which  represent  standards  of  attainment  or 
bases  of  measurement  or  comparison.  .  .  . 

5.  Arrangement  of  items: 

Items  should  be  grouped  so  as  to  facilitate  the  com- 
parison of  items  most  significantly  related.  Within 
groups,  some  systematic  order  should  be  adopted. 
The  most  serviceable  arrangements  are  according 
to  (a)  the  sequence  of  the  items  in  time,  with  the 
earliest  at  the  left ;  or,  (6)  the  size  of  the  items, 
with  the  largest  at  the  top  or  at  the  left;  or,  (c) 
the  favorableness  of  the  items,  with  the  most  favor- 
able at  the  top  or  at  the  left. 

6.  Position  of  titles,  etc. : 

So  far  as  practicable,  all  printing  upon  a  diagram 
should  be  so  placed  as  to  read  with  ease  from  the 
bottom  of  the  sheet. 

7.  Use  of  colors: 

Where  a  need  for  duplicates  may  arise,  charts  should 
be  made  entirely  in  black  and  white.  The  use 
of  colors  is  not  recommended  except  for  large  wall 
charts. 

8.  Size  of  sheet: 

Avoid  irregular  sizes  of  paper.  As  far  as  practi- 
cable, the  estabhshed  correspondence  sizes  (8x10^ 
or  8|X  11)  are  to  be  used. 

B.    Choice  of  Graphic  Forms 

1.    For  simple  comparisons  of  size: 

a  —  Bars  —  Bars  are  the  most  satisfactory  graphic 
device  for  this  purpose.     In  general,  all  the  bars 


DIAGRAMMATIC   AND   GRAPHIC  PRESENTATION      275 

used  in  the  diagrams  of  a  single  study  should  be 
of  uniform  width.  .  .  . 

b  —  Lines  —  When  a  large  number  of  separate  items 
have  to  be  shown  in  a  single  diagram,  lines  may- 
be employed  in  place  of  bars. 

c  —  Position  —  Bars  (or  lines)  are  best  placed  hori- 
zontally. .  .  . 

2.  For  comparisons  of  component  parts: 

a  —  Subdivided  bars  —  Subdivided  bars  are  the  most 
satisfactory  form  for  this  case.  .  .  . 

h  —  Cross-hatching  —  Cross-hatching  is  the  best  way 
in  which  to  distinguish  the  component  parts.  .  .  . 

c  —  Position  —  Horizontal  bars  are  to  be  pre- 
ferred to  vertical,  except  when  the  items  are  sepa- 
rated by  intervals  of  time,  in  which  case  vertical 
bars  should  be  used.  .  .  . 

3.  For  displaying  frequency  distributions: 

a  —  Vertical  columns  {histogram  an  alternative)  — 
In  general,  the  vertical  bar  (or  column)  form  is 
to  be  used.  The  straight-line  histogram,  how- 
ever, is  a  satisfactory  alternative. 

b  —  Position  of  scales  —  the  scale  for  the  variable 
is  to  be  placed  along  the  horizontal  axis ;  the 
scale  for  the  frequencies  along  the  vertical  axis. 

4.  For  showing  geographic  variations: 

a  —  Dot  maps  —  Where  the  variable  takes  the  form 
of  varying  numbers  of  a  given  item,  the  situation 
is  best  represented  by  a  dot  map  in  which  each 
dot  represents  a  fixed  number  of  cases.  All  the 
dots  should  be  of  uniform  size  and  should  be  evenly 
spaced  over  the  areas  in  which  the  actual  items 
have  appeared. 


276  STATISTICAL   METHODS 

b  —  Shaded  maps  —  Where  a  continuous  variable 
is  to  be  shown,  soUd  black  and  wliite  and  graded 
cross-hatched  areas  constitute  the  most  satis- 
factory graphic  form.  Care  should  be  exercised 
to  secure  gradations  of  intensity  in  black  and  white, 
corresponding  closely  to  the  gradations  of  the 
variable. 

5.   For  showing  time  variations: 

a  —  Straight-line  graph  —  In  general,  the  use  of  the 
straight-line  graph  between  plotted  points  is  to 
be  recommended.  .  .  • 

b  —  Position  of  scales  ■ —  Intervals  of  time  should 
be  scaled  invariably  along  the  horizontal  axis.  .  .  . 

c  —  Zero  of  vertical  scale  —  There  is  a  strong  pre- 
sumption in  favor  of  the  appearance  of  the  zero 
of  vertical  scale  on  the  chart.  .  .  . 

d  —  Logarithmic  scale  —  The  logarithmic  scale  ver- 
tically is  to  be  used  when  rates  of  change  or  pro- 
portionate increases  or  decreases  are  to  be  em- 
phasized. When  the  logarithmic  scale  is  employed, 
the  Umits  of  the  scale  should  be  at  some  power  of 
ten. 

Statistical  Standards  in  the  Graphic 
Presentation  of  Facts  ^ 

The  excuse  for  the  use  of  graphics  in  statistical  analysis 
is  largely  if  not  wholly  their  universal  appeal.  Graphs 
speak  a  common  but  frequently  an  inarticulate  and  con- 
fused language.     There  is  an  attractiveness  about  them  which 

1  Adapted  from  Secrist,  Horace,  "Statistical  Standards  in  Business 
Research,"  Quarterly  Publications  of  the  American  Statistical  Association, 
March,  1920,  pp.  54-55. 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     277 

is  alluring  but  often  deceptive.     Their  appeal  is  visual  and 
instantaneous,  not  necessarily  reasoned  and  reflective. 

Distinguishing  between  rules  for  graphic  presentation 
and  the  standards  which  give  pertinency  to  the  rules,  the 
following  standards  may  be  formulated. 

First.  —  A  statistical  fact  and  its  form  of  representation 
should  agree.  By  this  single  standard,  deception,  whether 
resulting  from  a  confusion  of  the  apparent  with  the  real,  or 
the  superficial  with  the  fundamental,  is  fully  provided 
against.  The  object  of  statistical,  like  other  analysis,  is 
the  establishment  or  determination  of  truth.  Standards 
for  graphics  provide  for  their  use  in  influencing  but  never 
in  deceiving  men.  In  spite  of  the  standards  adhered  to, 
however,  both  results  may  be  accomplished  by  the  same 
graphic  device. 

Second.  —  Graphic  forms  should  be  selected  according 
to  their  psychological  appeal  and  their  ease  of  comprehen- 
sion, care  always  being  taken  not  to  violate  the  first  standard. 

Third.  —  Graphic  forms  should  be  chosen  in  accordance 
with  (a)  the  form  and  complexity  of  the  subject  matter 
illustrated,  and  (6)  the  type  of  consumer  for  whom  they  are 
intended,  or  the  purpose  which  they  are  intended  to  serve. 

Fourth.  —  Graphic  devices  should  be  considered  more  as 
illustrations  of  analysis  than  methods  by  which  analysis 
is  made. 

Fifth.  —  Graphic  figures  should  be  drawn  as  accurately 
as  a  visual  representation  will  permit.  Accuracy,  of  course, 
is  never  absolute.  In  graphics,  the  reaUzation  of  relative 
accuracy  of  each  part  and  of  the  totality  is  the  standard 
set.  To  this  standard  for  graphics,  a  corollary  is  needed ; 
graphic  forms  should  always  be  accompanied  by  the  orig- 
inal data  which  they  represent. 


278  STATISTICAL   METHODS 

The  Theory  and  Justification  of  Curve 
Smoothing  ^ 

The  Theory  of  Smoothing  Statistical  Data.  —  It  may  often 
be  known  a  priori  that  phenomena  should  exhibit  a  regular 
progression,  and  that  data,  when  graphed,  showing  as  zig- 
zag lines,  do  not  really  represent  the  ideal  fact,  owing  either 
to  the  paucity  of  the  data,  or  to  unavoidable  error  therein. 

In  a  series  of  group-values,  i.e.  totals  or  aggregates  be- 
tween a  series  of  limits  of  a  variable,  it  is  important  to  bear 
in  mind  that  —  assuming  the  counts  on  which  they  depend 
to  be  correct  —  what  is  known  is  merely  the  series  of  aggre- 
gates themselves ;  the  probable  distribution  yielding  these 
aggregates  has  to  be  conjectured.  When  the  totals  or  aggre- 
gates are  themselves  regarded  as  subject  to  error,  then  the 
distribution  may  be  modified  within  the  limits  of  probable 
uncertainty,  some  groups  being  diminished  and  others,  par- 
ticularly adjoining  ones,  increased. 

There  are  four  principal  classes  of  data  to  which  the 
process  of  curve-smoothing  is  appHcable.  These  may  be 
indicated  as  follows : 

(i)  Frequencies  of  a  phenomenon  at  successive  epochs 
or  during  successive  periods  of  time ;  as,  for  example, 
population  estimates  at  given  dates  and  numbers  of  deaths 
occurring  during  successive  years. 

(ii)  Rates  of  occurrence  of  a  phenomenon  per  unit  of 
reference  during  successive  periods ;  as,  for  example,  birth- 
rates per  thousand  of  population  per  annum  for  successive 
years. 

(iii)  Frequencies  in  respect  of  successive  values  of  char- 

'  Adapted  with  permission  from  Knibbs,  G.  H.,  Commonwealth  Statis- 
tician, "The  Mathematical  Theory  of  Population,  of  its  Character  and 
Fluctuations,  and  of  the  Factors  which  Influence  Them,  etc."  Appendix  A, 
Vol.  I,  Census  of  the  Cojnmonwcalth  of  Australia,  Melbourne,  pp.  86-88. 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     279 

acters  capable  of  continuous  variations ;  as,  for  example, 
the  number  of  persons  at  each  age  recorded  at  a  given 
census. 

(iv)  Rates  of  occurrence  of  a  phenomenon  per  unit  of 
reference  in  respect  of  successive  values  of  characters 
susceptible  of  continuous  variation ;  as,  for  example,  rates 
of  mortality  per  unit  per  annum  during  a  given  decennium 
in  respect  of  each  age. 

In  all  these  cases  the  characteristic  of  continuous  variation 
is  assumed  to  exist  either  actually  or  virtually.  Where 
statistical  results  are  discontinuous  such  a  process  is,  strictly 
speaking,  inapplicable ;  as,  for  example,  in  the  tabulation 
of  census  population  according  to  birthplace,  occupation, 
or  religion.  In  some  cases,  however,  although  the  data 
are  strictly  speaking  discontinuous,  the  principle  may  be 
applied  partially;  for  example,  in  the  case  of  a  tabulation 
of  dwellings  according  to  number  of  rooms  or  according 
to  number  of  inmates.  In  such  cases  the  character  pos- 
sessed is  progressive  without  being  continuous;  nevertheless, 
with  proper  qualifications,  the  smoothing  principle  may 
be  applied  even  to  these. 

Another  example,  more  nearly  approaching  but  not  at- 
taining continuous  variation,  is  the  representation  of  dwell- 
ings according  to  rental  value. 

Object  of  Smoothing.  —  From  the  foregoing  it  will  be  seen 
that  the  data  to  which  the  smoothing  process  is  strictly 
applicable  are  those  which  may  be  regarded  as  functions 
of  a  continuous  variable.  But  whether  such  functions  are 
readily  expressible  by  means  of  algebraic  formulse  or  not, 
is,  of  course,  really  immaterial.  The  essence  of  the  matter 
is  that  in  any  instance  the  data  are  in  the  main  such  as 
admit  of  representation  by  means  of  a  continuous  line,  or  a 
continuous  surface  or  solid  in  relation  to  continuous  units 


280  STATISTICAL   METHODS 

of  reference.  When  such  representation  has  been  made 
of  the  crude  results  of  observation,  it  is  ordinarily  found 
that  the  line  surface  or  solid  exhibits  evidences  of  marked 
irregularities  as  between  adjacent  points  or  series  of  points, 
their  general  trend,  however,  suggesting  an  underlying 
basis  of  orderly  progression.  This  progression  is,  of  course, 
affected  by  minor  influences  operating  at  individual  points, 
and  is  more  or  less  masked  by  the  paucity  of  the  data  on 
which  the  representation  has  been  based ;  thus  suggesting 
further  that  were  it  possible  to  obtain  data  of  unlimited 
extent,  these  irregularities  would  become  negligible.  For 
this  reason  the  object  of  the  smoothing  process  may  be  said 
to  be  that  of  removing  these  apparently  accidental  irregular- 
ities, and  of  thus  disclosing  the  basic  or  ideal  uniformity 
which  may  be  presumed  to  represent  the  facts  in  all  their 
generality. 

Justification  for  Smoothing  Process.  —  The  justifications 
for  the  smoothing  process  may  thus  be  said  to  be : 

(a)  That  the  irregularity  does  not  represent  the  phenome- 
non in  its  generality,  since  much  of  the  observed  irregu- 
larity is  kno^vTi  a  priori  to  be  due  only  to  paucity  of  data ; 

(6)  or  that  it  is  known  that  the  phenomenon  subject  to 
observation  is  really  regular ; 

(c)  or,  again,  that  the  observed  data  suggest  that  regu- 
larity of  trend  will  not  efficiently  represent  them. 

It  has  been  objected  that  any  system  of  smoothing  is, 
strictly  speaking,  unwarrantable,  since  such  a  process  vir- 
tually attempts  to  make  the  facts  accord  with  more  or  less 
questionable  preconceptions  regarding  them.  To  this  view 
it  may  be  rejoined  that  if  the  process  were  such  as  to 
produce  results  which,  though  smooth,  differed  sj-stemati- 
cally  and  materially  in  their  distribution  from  the  original 
observations,  the  objection  would  be  valid.     Where,  how- 


DIAGRAMMATIC  AND   GRAPHIC   PRESENTATION      281 

ever,  due  consideration  is  given  to  the  relative  magnitudes 
of  the  original  data,  and  the  smoothed  results  accord  there- 
with as  closely  as  the  data  will  allow  when  these  exhibit  a 
general  trend,  then  the  only  preconception  that  can  be  re- 
garded as  operative  is  the  justifiable  one  that  ordinarily 
natm^al  phenomena  do  not  progress  per  saltum.  In  this 
connection  it  must  be  noted  that  where  there  is  distinct  evi- 
dence at  any  stage  of  a  cataclysmic  disturbance  of  results, 
the  smoothing  process  for  such  points  or  periods  will  usually 
be  invalid  or  not  properly  applicable.  Examples  of  such 
cataclysmic  disturbances  of  statistical  data  are  war,  famine, 
pestilence,  earthquake,  etc.  Even  in  these  cases,  however, 
it  appears  admissible  under  certain  circumstances  to  apply 
a  smoothing  process ;  as,  for  example,  in  cases  where  the 
disturbances  referred  to  are  of  mo)"e  or  less  frequent  oc- 
currence, and  are  not  merely  isolated  instances. 

One  of  the  most  cogent  justifications  for  the  smoothing 
process  has  its  warrant  in  the  fact  that  the  recorded  results 
of  any  statistical  observations  are  necessarily  approximative, 
and  hence  that  the  value  of  the  function  recorded  for  any 
given  value  of  the  variable  is  probably  not  usually  more 
accurate  than  an  estimate  based  on  the  recorded  values  in 
respect  of  preceding  and  succeeding  values  of  the  variable. 
This  consideration  suggests  the  idea  of  weighting  successive 
observations  to  obtain  most  prol3al:>le  values,  which  idea 
forms  the  basis  of  one  of  the  leading  methods  of  adjustment. 
Again,  where  the  results  of  the  observations  are  to  be  em- 
ployed as  guides  to  future  action,  it  is  clear  that  these  re- 
sults should,  as  far  as  practicable,  be  freed  from  all  fluctua- 
tions which  may  be  considered  merely  accidental,  and  thus 
unlikely  to  be  reproduced  in  future  experience.  This  is 
of  considerable  importance  in  connection  with  the  construc- 
tion of  mortality  and  sickness,  superannuation,  and  similar 


282  STATISTICAL   METHODS 

tables  to  be  used  in  the  computation  of  rates  of  premium, 
and  for  the  conduct  of  valuations. 

REVIEW 

1.  How,  if  at  all,  does  the  above  discussion  apply  to  frequency 
series  showing : 

(1)  The  grades  assigned  to  civil  service  applicants  as  a  result  of 
a  written  examination? 

(2)  The  marks  assigned  as  a  result  of  an  oral  "mental  test"? 

(3)  The  number  of  workmen  working  classified  hours? 

(4)  The  number  of  brick  two-story  houses  per  unit  of  area  in  a 
residential  district  of  city  X? 

2.  How,  if  at  all,  does  the  above  discussion  apply  to  historical 
series  showing : 

(1)  The  number  of  troops  embarking  daily  for  France  from  the 
port  of  New  York,  June  1,  1918,  to  November  11,  1918? 

(2)  The  daily  total  stock  sales  on  the  New  York  Stock  Exchange, 
August  1,  1914,  to  October  1,  1914? 

(3)  The  number  of  personal  injuries  in  factory  X  from  January  1, 
1920,  to  June  30,  1920? 

Some  Advantages  of  the  Logarithmic  Scale  in 
Statistical  Diagrams  ^ 

The  graphic  method  in  statistics  is  primarily  a  device 
for  presenting  vividly  the  significant  relations  of  phenomena. 
Each  slope  of  a  curve  in  an  ordinary  two-dimension  sta- 
tistical diagram  is  the  visible  expression  of  some  relation- 
ship. If  the  purpose  of  a  particular  statistical  presenta- 
tion is  simply  an  accurate  recording  of  separate  details,  a 
diagram  is,  of  course,  a  poor  substitute  for  plain  numerical 
statements ;  but  when  the  relative  aspects  of  the  data  are 
to  be  emphasized  the  diagram  comes  into  its  own. 

*  Adapted  with  permission  from  Field,  J.  A.,  "Some  Advantages  of  the 
Logarithmic  Scale  in  Statistical  Diagrams,"  Journal  of  Political  Economy, 
October,  1917,  pp.  806-841. 


DIAGRAMMATIC   AND   GRAPHIC   PRESENTATION      283 

And  yet,  even  within  this  sphere  of  its  special  excellence, 
graphic  representation,  in  terms  of  the  common,  natural 
scale  of  uniform  intervals,  has  very  real  limitations.  Too 
frequently,  though  the  prol)lem  is  simple  and  the  diagram 
is  well  done,  the  eye  ^vill  fail  to  detect  the  precise  nature 
of  the  relationship  which  the  statistician  seeks  to  present. 

Some  of  the  shortcomings  of  natural-scale  representa- 
tion are  fairly  illustrated  by  Diagram  I.     The  upper  and 


Diagram  I.  —  Net  Deposits  (Heavy  Line)  and  Reserves 
(Light  Line)  of  the  Clearing-House  Banks  of  New 
York  City,  according  to  the  41st  Weekly  Report  (Early 
October)  in  Each  Year,   1867-1909 

Natural  Scale 

Data  (except  for  the  year  1888)  from  Statistics  for  the  United  States,  1867- 
1009,  compiled  for  the  National  Monetary  Commission  hy  A.  Piatt 
Andrew 

Millions  o( 

Dollars 

1500 

MOO 

1300 

1200 

1100 

1000 

900 

600 

700 

600 

400 
300 
200 
100 
0 

ri  r^  QQ  CO  5^  C;  O 

<r>  oO  CO  CO  aO  OO  CT^ 


284  STATISTICAL   METHODS 

lower  curves  of  this  figure  show,  respectively,  the  net  de- 
posits and  the  reserves  of  the  New  York  Clearing-House 
banks  in  early  October  of  each  year  from  1867  to  1909, 
inclusive.  From  the  diagram  in  this  form  certain  facts 
are  indeed  sufficiently  clear.  Both  deposits  and  reserves 
increased  markedly  during  the  period  under  review.  The 
increase  of  each,  though  on  the  whole  progressive,  has  been 
subject  to  appreciable  fluctuations ;  and  the  fluctuations 
of  one  curve  are  associated  with  synchronous  and  appar- 
ently similar  fluctuations  of  the  other.  The  amount  of  de- 
posits or  of  reserve  in  the  early  days  of  any  particular  Octo- 
ber may  be  estimated  by  consulting  the  scale  at  the  side 
of  the  diagram.  The  amount  of  increase  or  decrease  of 
either  item  during  a  given  year  or  term  of  years  is  not  diffi- 
cult to  determine  approximately.  All  this  information, 
then,  the  ordinary  scale  gives  adequately.  Some  of  it  would 
be  less  satisfactorily  given  by  any  other  scale.  But  if  we 
press  our  inquiries  further  and  ask,  on  the  basis  of  these  early 
October  statements,  w^hether,  for  example,  the  expansion 
of  deposits  was  relatively  greater  in  the  year  after  the  crisis 
of  1907  than  in  the  year  after  the  crisis  of  1873,  or  whether 
the  contraction  of  deposits  was  relatively  greater  before 
1873  than  before  1896 ;  if  we  try  to  compare  the  percent- 
ages of  reserve  held  in  the  years  before  1870  with  the  cor- 
responding figures  since  1895 ;  or  if  we  wish  to  know  spe- 
cifically what  was  the  percentage  of  reserve  in  early  October 
of  1905,  deficiencies  of  the  natural  scale  are  revealed.  None 
of  these  questions,  which  concern  relations  rather  than  de- 
tached facts,  is  satisfactorily  answered  by  the  diagram. 
If  answers  are  forthcoming  at  all,  it  is  only  because,  through 
the  scales,  one  may  roughly  and  inconveniently  recover 
the  numerical  data  from  which  the  diagram  was  made. 
This,  however,  could  have  been  more  easily  aocompUshed 


DIAGRAMMATIC  AND   GRAPHIC   PRESENTATION      285 

by  ignoring  the  diagram  altogether  and  consulting  its  data 
in  the  form  of  a  table. 

It  is  practicable,  of  course,  to  contrive  a  diagram,  drawn 
to  a  natural  scale,  with  the  special  purpose  of  bringing  out 
some  one  fact  or  relation  which  in  Diagram  I  has  remained 
obscure.  Thus  the  percentage  of  reserve  of  the  New  York 
banks  could  be  plotted,  year  after  year,  as  a  separate  curve. 
This  curve,  however,  would  in  turn  fail  to  show  the  abso- 
lute amounts  of  reserves  and  deposits.  The  difficulty 
is  to  devise  a  form  of  representation  which  shall  show,  di- 
rectly and  graphically,  JDoth  relative  and  absolute  magni- 
tudes. A  complete  solution  of  this  problem  is  hardly  at- 
tainable, but  logarithmic  diagrams  in  certain  cases  go  far 
toward  meeting  the  want  where  the  relative  aspects  of  the 
phenomena  are  primarily  to  be  emphasized. 

The  logarithmic  scale  may  indeed  be  described  as  a  scale 
of  ratios.  On  it  absolute  distances  measure  relative  magni- 
tudes. The  numbers  which  occur  at  equal  intervals  along 
a  logarithmic  scale  thus  form  not  an  arithmetic  but  a  geo- 
metric progression ;  and  consequently  the  same  propor- 
tionate relation  exists  between  any  two  numbers  a  given 
distance  apart  on  a  given  logarithmic  scale,  regardless  of 
their  absolute  magnitudes  and  regardless  of  their  absolute 
differences.  Conversely,  the  numbers  2  and  4  on  a  loga- 
rithmic scale  are  separated  by  the  same  distance  as  the 
numbers  500,000  and  1,000,000,  for  the  simple  and  decisive 
reason  that  the  larger  number  of  each  pair  is  double  the 
smaller  number. 

The  mathematical  principle  of  the  scale  is  suggested  by 
Diagram  II.  Here  the  graduations  above  the  horizontal 
line  mark  off  the  intervals  of  a  logarithmic  scale  from  1  to 
100.  The  feature  of  this  scale  which  at  once  strikes  the 
eye  rather  bewilderingly  is  that  the  interval  between  succes- 


286 


STATISTICAL   METHODS 


sive  numbers  is  not  constant,  but  progressively  narrows 
as  the  numbers  grow  larger.  Closer  scrutiny  reveals  the 
more  significant  and  clarifying  fact  that  the  interval  is  con- 
stant between  numbers  which  bear  to  each  other  a  given 
ratio.  Thus  1,  2,  4,  8,  16,  32  stand  at  equal  distances  apart ; 
as  do  1,  3,  9,  27,  or  1,  5,  25,  or  1,  10,  100.  The  uniform 
interval  which  separates  the  numbers  of  this  last-named 
series  —  successive  powers  of  10  —  has  been  taken  as  the 


Diagram  II.  —  The  Logarithmic  Scale  from  1  to  100 


10 


0  1     0.2     0.3     0.4 


6     7    8   9 


0.6     0.7     0.8     0.9 


0.5 


20  30       40     50    60  70  80  90100' 


25 


35 


II     1.2     1.3     14 


45 


il'iii'|l'!!'!ii 


16     I  7     13     1.9 


1.3 


1.0 


2.0 


unit  upon  which  is  based  the  ordinary  scale  below  the  hori- 
zontal in  the  diagram.  If,  now,  any  number  on  the  upper 
scale  be  regarded  as  a  power  of  10,  it  will  be  found  that  the 
corresponding  reading  of  the  lower  scale  gives  the  index  of 
that  power.  This  relation  holds  invariably;  for  not  only 
do  we  find  10  (i.e.  W)  opposite  1,  100  (i.e.  10-)  opposite 
2,  and  1  (i.e.  10°)  opposite  0,  but  the  square  root  of  10  (i.e. 
10^,  or  3.1623)  is  opposite  0.5;  the  square  root  of  1000 
(i.e.  lOf,  or  31.623)  is  opposite  1.5  —  and  so  on  indefinitely, 
whatever  the  index  of  the  power.  In  fact,  the  number 
at  any  point  of  the  lower  scale  is  the  common  logarithm 
of  the  number  at  the  same  point  of  the  upper  scale.^ 

'  The  system  of  logarithms  which  is  in  ordinary  use  expresses  any  given 
number  as  a  certain  power  of  10.  Tlie  logarithm  of  the  given  number 
indicates  what  power  of  10  that  number  is.  Thus  the  logarithm  of  10  is  1 ; 
the  logarithm  of  100,  i.e.  of  10X10,  or  10^,  is  2;  the  logarithm  of  1000,  or 
10^  is  .3,  and  so  on.  A  logarithm  is  in  fact  an  exponent  —  the  index  of  a 
power  —  and  the  derivation  and  uses  of  logarithms  consequently  follow 


DIAGRAMMATIC  AND   GRAPHIC   PRESENTATION      287 

If,  now,  it  is  desirotl  to  use  the  logarithmic  scale  in  the 
construction  of  a  statistical  diagram,  we  may  proceed  in 
either  of  two  ways.  We  may  reduce  the  data  to  loga- 
rithmic terms,  and  then,  using  an  ordinary  natural  scale, 
plot  the  logarithms  of  the  given  quantities  instead  of  the 
quantities  themselves.  Or  if  we  have  at  our  disposal  co- 
ordinate paper  ruled  at  logarithmic  intervals,  like  the 
intervals  of  the  upper  scale  of  Diagram  II,  we  may  work 
directly,  without  any  reduction  of  the  data,  locating  the 
points  of  the  diagram  quite  mechanically  by  the  graduations 
of  the  paper,  and  relying  upon  these  graduations  for  the 
logarithmic  character  of  the  result.  The  two  methods  are 
entirely  equivalent,  as  should  be  evident  from  Diagram  II. 
Indeed  it  is  often  convenient  to  regard  a  diagram  as  con- 
structed by  both  methods,  and  to  supply  for  its  more  com- 
plete explanation  a  logarithmic  scale  of  the  natural  num- 

the  algebraic  rules  of  exponents.  In  the  case  of  a  number  which  is  not  an 
even  power  of  10  it  is  possible  to  compute  the  logarithm  in  the  form  of  a 
fractional  exponent.     For  example,   as  the  text  implies,   the  logarithm  of 

31.623,  the  square  root  of  1000,  i.e.  VlO^  or  lOf,  is  1.50.  By  extending  the 
principle  of  fractional  exponents  the  logarithm  of  any  assignable  number 
may  be  approximately  expressed. 

The  peculiar  advantage  of  the  logarithmic  scale  in  statistical  work  is  a 
consequence  of  the  elementary  logarithmic  principle  that  the  difference 
between  the  logarithms  of  two  numbers  is  the  logarithm  of  the  ratio  of  the 
one  number  to  the  other.     That  is, 

log  a  — log  b  =log-- 
b 

Hence,  whenever  the  ratio  between  two  numbers,  a  and  b,  is  the  same  as  the 

ratio  between  two  other  numbers,  p  and  q,  so  that  7  =  ~j  and  log-  =log-.  it 

b     q  b  q 

will  follow  that  log  a  — log  b  =log  7J  — log  q.     Plotted  to  a  given  natural  scale, 

log  a  and  log  b  would  thus  differ  by  the  same  interval  as  log  p  and  log  q,  — 

ft  -jj 

the  equality  of  these  differences  indicating  the  equality  of  the  ratios -and-  " 

.  ? 

The  device  of  plotting  statistical  quantities  in  terms  of  their  logarithms  is, 

then,  simply  an  exploiting  of  the  general  principle  that  the  absolute  differ- 
ence between  two  logarithms  is  a  measure  of  the  relative  difference  of  the 
numbers  to  which  they  correspond. 


288 


STATISTICAL   METHODS 


bers  on  one  side,  and  a  natural  scale  of  their  logarithms 
on  the  other.'^ 

Before  attempting  a  logarithmic  presentation  of  the  bank 
data  of  Diagram  I,  it  will  be  well  to  consider,  in  artificially 
simplified  cases,  certain  general  properties  of  logarithmic 
diagrams  which  furnish  the  key  to  their  interpretation. 

DiAGKAM  III.  —  Arbitrary  Example  of  a  Phenomenon  Increa?- 
iNG  BY  Equal  Relative  Oscillations 

Magnitude  NalUTol  Scolc 


HAJ 

1 

\ 

I 

\ 

/ 

\ 

150 

1 

\ 

1 

\ 

1 

\ 

1 

\ 

/ 

\ 

100 

1 

\ 

1 

1 

V 

1 

x 

1 

50 

x 

1 

N 

/ 

"■^.^ 

—- _^ 

-^^ 

"^ 

0 

Years     I 


10 


Let  us  take  for  our  first  illustration  the  arbitrary  example 
of  Diagram  III.  Here  an  assumed  phenomenon,  which  has 
a  magnitude  of  1  when  it  is  first  observed,  increases  to  5 
in  the  course  of  a  j^ear  and  then,  in  the  second  year,  falls 
off  to  2^.  In  the  third  year  it  again  increases  fivefold,  to 
12|.  In  the  fourth  year  it  again  declines  by  half,  to  6:^. 
Thus  alternately  quintupled  and  cut  in  two,  the  phenome- 
non grows  by  perfectly  regular  oscillations.  Diagram  III, 
which  is  drawn  to  an  ordinary  natural  scale,  shows  vividly 
the  accelerated  character  of  this  increase,  stated  in  abso- 

1  For  an  example  of  this  treatment  see  Diagram  IV  on  p.  289. 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     289 

lute  numbers ;  but  precisely  because  it  is  a  natural-scale 
diagram  it  fails  to  show  at  all  obviously  that  the  rate  of 
relative  rise  and  fall  is  the  same  for  all  the  oscillations. 
The  earlier  waves  of  the  curve,  which  are  absolutely  small, 
are  made  to  seem  in  all  respects  comparatively  insignificant. 


Diagram  IV.  —  Arbitrary  Example  of  a  Phenomenon  Increas- 
ing BY  Equal  Relative  Oscillations 


ALgnitudo 


Logarithmic  Vertical  Scale 

Data  of  Diagram  III 


Logtrithm 


Years    I 


Strikingly  different  is  the  effect  of  Diagram  IV,  in  which 
the  data  of  Diagram  III  are  plotted  to  a  logarithmic  scale. 
Absolute  magnitudes  here  can  be  determined  only  from  the 
numbers  of  the  scale :  the  graphic  evidence  of  the  diagram 
establishes  the  identity  of  the  relative  changes,  step  by 
step,  for  the  whole  serrate  curve.  Every  ascent  has  the 
same  vertical  rise.  That  is,  the  indicated  percentages  of 
increase  are  uniform.  Each  decline  has  the  same  drop : 
the  percentage  of  decrease  shown  by  each  is  the  same, 
u 


290  STATISTICAL   METHODS 

This  equal  relative  significance  of  equal  absolute  distances 
is  the  essential  characteristic  of  the  logarithmic  scale. 

Certain  fairly  obvious  but  important  corollaries  follow 
from  this  fundamental  principle.  Since  the  upstrokes  of 
the  curve  in  Diagram  IV  are  all  straight  lines  rising  by  the 
same  amount,  and  since  each  rise,  occurring  in  the  same 
period  of  time,  is  allotted  in  the  diagranj  the  same  hori- 
zontal distance,  it  follows  that  the  slope  of  the  several  up- 
strokes is  the  same.  The  downstrokes  are  similarly  all 
of  the  same  slope.  Quite  generally,  where  a  curve  is  drawn 
to  a  logarithmic  vertical  scale  and  a  natural  horizontal 
scale,  equal  slopes  indicate  equal  rates  of  relative  change. 
By  extension  of  this  rule  it  will  be  seen  that  a  constant 
rate  of  increase  is  represented  in  a  logarithmic  curve  by  a 
constant  slope  —  i.e.  by  a  straight  line ;  and  that  wherever 
in  such  a  logarithmic  diagram  two  curves  run  parallel,  in 
the  sense  that  the  vertical  distance  between  them  remains 
unaltered,  the  phenomena  which  they  respectively  repre- 
sent maintain  to  each  other  a  constant  ratio,  inasmuch  as 
an}^  change  of  the  one  is  evidently  coincident  with  a  change 
of  the  other  to  the  same  relative  extent. 

These  generalizations  may  be  simply  illustrated  by  the 
examples  which  follow. 

In  Diagram  V,  drawn  to  natural  scale,  the  continuous 
curve  traces  the  growth  of  the  population  of  the  United 
States,  according  to  the  decennial  enumerations  of  the 
United  States  Census  from  1790  to  1910,  inclusive.  The 
broken  line,  uppermost  in  the  diagram,  shows  what  the 
growth  of  population  would  have  been  if  the  rate  of  relative 
increase  observed  between  1790  and  1800  —  35.1  per  cent 
for  the  decade  —  had  persisted  without  change  since  that 
time.  The  dotted  line  at  the  bottom  of  the  figure  shows 
what  the  growth  would  have  been  if  the  absolute  increase 


DIAGRAMMATIC  AND   GRAPHIC   PRESENTATION      291 


Diagram    V.  —  Growth    of   the   Population    of   the   United 

States,   1790-1910 

The  coutinuous  line  shows  the  actual  increase  according  to  the 
census  returns.  The  broken  and  dotted  lines  show  the  growth 
which  would  have  taken  place  if  relative  and  absolute  increase, 
respectively,  had  continued  at  the  rate  of  the  first  decade. 

Natural  Scale 

Data  from  13th  Census  of  the  United  States,  I,  24.     The  corrected  estimate 
for  1870  has  been  taken  instead  of  the  original  enumeration. 

Population 
in  Millii 


ISO 
140 
130 
120 

no 

IGU 
90 
80 
70 
60 
50 
40 
30 
20 
10 


0 


i 

r 

1 
1 

1 

1 
1 

1 

f 
1 

/ 

/ 
/ 

i 

/ 

/ 
/ 

/ 

1 
1 

/ 

/ 

/ 
/ 

/ 

/ 
/ 

/ 

iy 

^ 

/ 



^ 

r 

— ■ 

,-ersS 

r..... 



o 


o        o 
o         — 

OO  OO 


o 

OO 


CO 


o 

CO 


o 
OO 


o 
OO 
OO 


o 

OO 


o 
o 


Census  Years 


292  STATISTICAL   METHODS 

of  population  in  each  decade  since  1800  had  been  the  same 
as  the  increase  — 1,379,269  persons  —  from  1790  to  1800. 
In  other  words,  these  two  additional  curves  represent  re- 
spectively geometric  and  arithmetic  progressions  based  on 
the  observed  increase  in  the  first  intercensal  period.  It 
is  to  be  noted  that  in  a  natural-scale  construction  the  curve 
of  arithmetic  progression  is  a  straight  line. 

In  Diagram  VI,  drawn  to  a  logarithmic  scale,  the  con- 
tinuous line,  the  broken  line,  and  the  dotted  Hne  represent 
each  the  same  data  as  in  Diagram  V.  But  here  the  char- 
acter of  the  curves  is  significantly  different.  The  dotted 
arithmetic-progression  curve,  recording  a  constantly  di- 
minishing ratio  of  increase,  falls  away  in  this  figure  more 
and  more  toward  the  horizontal.  And  here  it  is  the  geo- 
metric progression  which  appears  as  a  straight  line,  its 
constant  slope  denoting  a  constant  rate  of  increase  —  i.e. 
the  same  relative  increase  in  every  equal  period  of  time. 

The  growth  of  funds  invested  at  compound  interest  af- 
fords another  instance  of  geometric  increase  and  therefore 
another  example  of  a  straight-line  curve  if  a  diagram  is 
drawn  to  a  logarithmic  scale.  The  slope  of  the  curve  here 
depends  upon  the  rate  of  interest  and  the  interval  between 
dates  at  which  the  interest  is  regularly  compounded ;  but 
for  a  given  rate  and  interval  it  is  fixed  and  constant. 
Hence,  a  logarithmic  chart  equivalent  to  a  compound- 
interest  table  may  very  readily  be  constructed.  Diagram 
VII  is  such  a  chart.  In  it  a  single  straight  line  suffices  to 
indicate  the  amount  to  which  an  initial  sum  of  $100,  com- 
pounded semi-annually  at  a  given  rate,  will  have*  increased 
on  any  compounding  date  included  in  the  diagram.^     The 

*  The  period  of  time  covered  by  such  a  chart  is  of  course  in  principle 
unlimited,  for  the  lines  will  continue  with  their  same  specific  slopes  however 
far  the  diagram  may  be  extended. 


DIAGRAMMATIC  AND   GRAPHIC  PRESENTATION      293 


Diagram  VI.  —  Growth   of   the   Population   of   the   United 

States,   1790-1910 

Logarithmic   Vertical  Scale 

Data  and  explanations  as  in  Diagram  V 

Population 
in  Millions 


150 

• 

^ 

r 

/ 

/ 

100 
90 
80 
70 

60 
50 

40 

30 
23 

20 
13 

10 
9 
8 
7 
6 

3 

4 

/ 

/ 

/ 

^ 

/ 

y 

y 

/   , 

y 

/ 

X 

> 

'/ 

V 

// 

/ 

,/ 

^ 

/ 

f 

^, 

..'•' 

" 

/ 

/ 

^** 

^  «•  * 

' 

/ 

.- 

/ 

,.■■'' 

/'' 

/ 

.' 

J 

/ 

/ 

1790  1600  1810  1820  1830  1840  1850  I860  1870  1680  1690  1900  1910 

Census  Years 


294 


STATISTICAL   METHODS 


Diagram  VII.  —  Compound-Interest  Chart  (Semiannual 

Compounding) 


Amount 


Logarithmic  Vertical  Scale 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     295 

4  per  cent  line  is  steeper  than  the  3  per  cent  hne ;    the 

5  and  6  per  cent  Unes  are  successively  steeper  still ;  but 
all  are  straight,  and  for  each,  when  the  scales  of  the 
diagram  are  once  determined,  the  slope  is  fixed  and  char- 
acteristic. 

The  same  diagram  serves  also  to  illustrate  another  prop- 
erty of  logarithmic  diagrams  that  has  already  been  men- 
tioned. The  broken  line  across  the  middle  of  the  figure 
has  been  drawn  to  show  the  increase  of  $125,  compounded 
semiannually  at  6  per  cent.  It  is  at  -once  apparent  that 
this  line  parallels  the  continuous  line  of  the  increase  of  $100 
at  the  same  rate.  The  reason  for  the  parallelism  is  toler- 
ably patent.  Each  of  the  sums,  $100  and  $125,  increases 
every  six  months  by  3  per  cent  of  its  accumulated  amount. 
That  is,  each  siun  is  semiannually  multiplied  by  1.03. 
In  the  diagram,  therefore,  each  of  the  two  lines  must  rise, 
from  one  ordinate  to  the  next,  by  the  fixed  vertical  distance 
which,  on  the  logarithmic  scale,  corresponds  to  the  ratio 
1.03 :  1.00.  This,  of  course,  insures  that  both  rise  alike. 
Or  it  may  rather  be  argued  that  since  original  sums  in  the 
proportion  of  1.25  to  1  are  here  assumed  to  be  compounded 
at  the  same  rate  and  the  same  interval,  the  cumulative 
results  will  be  at  any  subsequent  time  in  the  same  propor- 
tion of  1.25  to  1.  The  vertical  distance  between  the  two 
curves  on  any  ordinate  must  therefore  express  the  ratio 
1.25 :  1.00,  and  hence,  since  a  given  ratio  always  corresponds 
to  the  same  absolute  interval  on  a  logarithmic  scale,  the 
curves  must  be  always  at  the  same  distance  apart  and  there- 
fore parallel.  It  follows  that  if  a  point  be  taken  on  the 
initial  ordinate  of  this  diagram,  opposite  the  value  $125 
of  the  vertical  scale,  the  straight  line  drawn  through  that 
point  parallel  to  the  original  6  per  cent  curve  will  represent 
the  compound  increase  of  $125  at  6  per  cent.     Similarly, 


296  STATISTICAL   METHODS 

to  find  the  increase  of  any  capital  sum  at  any  rate  of  com- 
pound interest,  one  has  only  to  draw  a  straight  line  start- 
ing at  the  height  which  denotes  the  given  sum  and  running 
parallel  to  a  standard  curve  for  the  given  rate  of  interest. 
In  Diagram  VII  this  principle  has  a  somewhat  different 
appKcation.  Through  the  point  representing  a  sum  of  $200 
at  the  end  of  6  years  have  been  drawn  broken  lines  parallel 
to  the  standard  curves  showing  respectively  3  per  cent, 
4  per  cent,  5  per  cent,  and  6  per  cent  increase.  These 
several  broken  lines' cut  the  initial  ordinate  at  heights  which, 
read  in  terms  of  the  vertical  scale,  show  what  amount  of 
money,  compounded  semiannually  at  each  respective  rate 
of  interest,  would  amount  to  $200  after  6  years.  .  .  . 

Since  logarithmic  scales  have  no  zero,  logarithmic  dia- 
grams can  have  no  base-line  at  zero.  Indeed,  they  have 
no  base-fine  at  all ;  or,  rather,  every  value  of  the  logarith- 
mic scale  is  as  much  a  base-value  as  any  other.  This  fol- 
lows from  the  cardinal  principle  already  repeatedly  stated, 
that  the  same  absolute  interval  stands  for  the  same  ratio 
of  magnitudes  at  any  and  every  part  of  a  given  logarithmic 
scale.  It  obviously  constitutes  an  essential  distinction 
between  logarithmic  and  natural-scale  diagrams.  In  a  nat- 
ural-scale diagram  the  importance  of  showing  the  base- 
line at  zero  of  the  vertical  scale  can  hardly  be  urged  too 
strongly.  If  this  base-line  be  omitted,  as  it  often  is  in  un- 
intelligent work,  proper  visual  estimation  of  relative  magni- 
tudes is  made  impossible.  Such  omissions  in  complex 
natural-scale  diagrams  involving  more  than  one  base-fine 
lead  to  extreme  confusion  and  fallacy.  In  logarithmic 
diagrams  fallacious  effects  of  this  particular  sort  are  impos- 
sible ;  but  any  suggestion  of  a  specific  base-line  may  prove 
disconcerting  to  those  unfamiliar  with  the  logarithmic  scale 
and  may  cause  misconception  of  its  character. 


DIAGRAMMATIC  AND   GRAPHIC  PRESENTATION      297 

The  principles  which  have  thus  far  been  developed  may 
now  be  recapitulated : 

Throughout  a  given  diagram,  and  regardless  of  the  abso- 
lute magnitudes  concerned : 

(1)  a  given  distance  between  anj^  two  points,  measured 

along  a  logarithmic  scale,  indicates  in  every  case 
the  same  ratio  between  the  two  magnitudes  which 
the  positions  of  the  points  represent ; 

(2)  when   changing  magnitudes  are  plotted  to  a  verti- 

cal logarithmic   scale,   and  unit  intervals  of  thne 

are  plotted  to  a  horizontal  natural  scale, 

(a)  the  slope  of  a  curve  is  always  an  index  of  the 

rate  of  relative  change  ; 
(6)  a  straight  line  represents  a  constant  rate  of 
relative  change ;  and,  conversely,  a  con- 
stant rate  of  relative  change  is  always  rep- 
resented by  a  straight  line ; 
(c)  where  the  vertical  distance  between  two  curves 
is  constant  the  variables  which  they  re- 
spectively represent  maintain  always  the 
same  proportion  one  to  the  other;  and, 
conversely,  two  variables  constantly  in  the 
same  proportion  are  always  represented  by 
two  curves  at  a  fixed  vertical  interval. 

The  logarithmic  scale  admits  of  no  zero,  and  in  terms  of  a 
logarithmic  scale  no  base-line  should  ordinarily  be  indicated. 

With  these  general  principles  in  mind  we  may  now  con- 
sider Diagram  VIII,  in  which  the  bank  statistics  of  Dia- 
gram I  are  plotted  to  a  logarithmic  scale.  The  questions 
which  Diagram  I  failed  to  answer  find  here  a  ready  solu- 
tion, and  incidentally  illustrate  certain  useful  devices  for 
the  interpretation  of  logarithmic  diagrams  in  general. 


298  STATISTICAL   METHODS 

The  relative  expansion  of  deposits,  evidenced  by  the  abso- 
hite  rise  of  the  upper  curve  in  Diagram  VIII,  was  plainly 
greater  in  the  year  following  October,  1873,  than  in  the  year 
following  October,  1907.  How  great  it  was  in  either  year 
may  be  determined  with  the  aid  of  the  percentage  scale 
of  increase  at  the  right  of  the  main  figure.  This  scale,  it 
is  to  be  noted,  holds  good  for  vertical  measurements  at  all 
parts  of  the  diagram,  since  its  logarithmic  intervals  make 
it  a  scale  of  ratios,  quite  independent  of  absolute  magni- 
tudes. The  vertical  rise  of  the  deposit  curve  following 
1873  shows  by  the  scale  an  increase  of  approximately  50 
per  cent.  The  rise  after  1907,  similarly  measured,  is  some 
38  per  cent. 

Relative  decreases  of  deposits  can  be  tested  in  a  manner 
quite  analogous  by  the  logarithmic  scale  of  percentage 
decrease.  Here,  for  convenience,  the  scale  reads  from  the 
top  downward,  rather  than  up  from  the  bottom,  as  in  the 
scale  of  increase.  The  contraction  of  deposits  from  Octo- 
ber, 1871,  to  October,  1873,  as  measured  by  the  decrease 
scale,  was  about  27  per  cent  —  appreciably  greater  than  the 
contraction  of  some  22  per  cent  during  the  two  years  pre- 
ceding October,  1896. 

The  proportion  of  reserve  to  deposits  at  any  given  date 
is  obviously  to  be  determined  from  Diagram  VIII  by  meas- 
uring the  appropriate  vertical  distance  between  the  reserve 
curve  and  the  deposit  curve.  For  this  purpose  one  might 
use  the  scales  designed  to  measure  increase  and  decrease. 
Thus,  in  October,  1905,  deposits  were  not  quite  four  times 
as  great  as  reserves,  according  to  the  multiple  scale.  Inter- 
preted by  the  scales  of  decrease,  reserves  were  equivalent 
to  slightly  more  than  a  quarter  of  the  deposits,  or  were 
some  74  per  cent  less  than  the  deposits.  None  of  these 
statements,    however,    expresses    reserves    in    the    conven- 


■W..L.J..I.J..L- 


2  ^ 
n 

6  2 


JJ 


PercCTtage  of  Decrease 
■00  o  Q    o     C5        o 

I  I   I    I     I      I     I  I     I 


Fractiona]  Parts 


TT 


Sill        8     S        c 
I         I ll  1 1 1  I  I  I  1  I 


Mil 


ux 


saidqinj^ 


4 

a 


^  2 
i    3 


0 

0 
00 

0 

-?i:    -, 

y 

y 

-  -^1    1 

4t 

- -3- 

--? 

.  5UDI 

..-!?», 

\ 

-4^^f- 

/ 

^ 

\, 

>. 

s 

UUOI 

^ 

s 

\ 

^ 

■v. 

/ 

.  TAOI 

^ — 

^ 

^^.          1 

J 

7 

\ 

s 

\ 

\ 

(VOI 

\ 

kkMi 

/ 

s 

^ 

\ 

> 

.^^ 

foor 

s 

K 

s 

1 

■^. 

~-, 

V 

k 

s. 

V 

f 

• 

"N 

^ 

-^ 

\ 

1 

/ 

r 

1 

,.  c  fo\ 

/ 

■^ 

■"" 

— 1 

_ 

y 

— -* 

/ 

r 

r 

\ 

. 

i 

f\fQ\ 

\ 

/ 

1 

\ 

\ 

\ 

1 

w    u  0 

^ 

.2-5  "^ 

0 

3Q- 

^ 

8  8    8 


? 


o 


a  n 

J  o 

^  o 

w  O 

o  <i 

H  '"^ 

I— <  Qi 

^  o 

to  H 

^^ 

H 

*  H 

'z  a 


o 

O 

M 

Q 

B3 
O 

o 

< 


m  . - 

E^  ^ 

*:!  ^ 

m  >-< 

g  ^ 

'^  o 

t^  >^ 
a 

:^  ^ 

I  "^ 

1  ^ 

>  rx 

gCQ 


05 
O 
05 


CO 

X 


a 


299 


300  STATISTICAL  METHODS 

tional  way  as  a  percentage  of  deposits.  For  convenience, 
tlierefore,  a  special  inverse  logarithmic  scale  is  provided  at 
the  extreme  right  of  the  figure.  If  a  given  vertical  interval 
between  the  reserve  curve  and  the  deposit  curve  is  laid  off 
on  this  scale,  from  the  bottom  upward,  the  reading  of  the 
inverse  scale  states  the  reserve  directly  as  a  percentage  of 
deposits.  In  October,  1905,  it  thus  appears  that  the  reserve 
stood  at  26  per  cent.  The  rough  parallelism  of  the  two 
curves  throughout  their  whole  course  shows  that  the  per- 
centage of  reserve  has  not  greatly  changed.  Nevertheless, 
it  is  tolerably  clear  that  the  reserves  held  in  early  October 
were  rather  larger  before  1870  than  since  1895 ;  for  in  the 
former  period  the  curves  are  nearer  together.  The  last 
of  the  questions  which  Diagram  I  left  unsettled  thus  finds 
its  answer  in  Diagram  VIII.  ... 

Another  merit  of  no  slight  importance  is  to  be  recorded 
for  the  logarithmic  scale :  it  is  far  superior  to  the  natural 
scale  for  effecting  comparisons  when  very  small  and  very 
large  quantities  must  be  taken  into  account  concurrently.  .  .  . 
Whenever  a  historical  curve  records  extreme  growth,  the 
same  advantage  is  found.  It  is  not  necessary  to  dwarf 
the  small  beginnings  in  order  to  keep  the  later  develop- 
ment within  manageable  dimensions.  A  study  of  Dia- 
grams III  and  IV  will  illustrate  this  point.  More  striking 
illustration  is  offered  in  Diagrams  IX  and  X.  The  pro- 
duction of  tinplate  in  1891  and  the  years  immediately  fol- 
lowing was  so  small  that  the  ordinary  diagram  (Diagram 
IX)  leaves  inconspicuous  the  extremely  rapid  rate  of  prog- 
ress in  output  during  those  first  years.  The  logarithmic 
diagram  (Diagram  X)  quite  reverses  the  emphasis.  Plainly, 
the  recent  increase  has  been  far  from  proportionate  to  the 
exuberant  growth  of  the  infant  industry. 

Although  the  years  of  small  beginnings  in  a  historical 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     301 

record  may  present  no  features  that  require  special  consid- 
eration, the  logarithmic  historical  diagram  is  again  advan- 
tageous whenever  substantially  the  same  rate  of  relative 
increase  characterizes  the  whole  period  under  review.     In 


Diagram    IX.  —  Annual    Production     of    Tinplate    in    the 
United  States,   1891-1912 

Natural  Scale 
Data  from  D.  E.  Dunbar:    The  Tin  Plate  Industry,  p.  15 


Tons 

1.000.000 

900.000 

eoo.ooo 
/oo.ooo 

'000.000 
500000 

/ 

/ 

1 

/ 

/ 

/ 

\ 

^ 

/ 

400000 

/ 

^ 

^ 

100.000 
200.000 
lOO.OOO 

/ 

^ 

X^ 

/ 

/ 

/ 

k 

/ 

/ 

16 

91 

/" 

L 

l« 

93 

19 

00 

19 

03 

19 

10 

Cale 

nddi 

Year 

s 

such  cases  the  general  trend  or  growth-axis  of  the  loga- 
rithmic curve  will  of  course  be  nearly  straight.  This  is 
interesting  for  its  evidence  of  consistent  growth.  It  has 
the  further  technical  merit  of  permitting  the  trend  of  the 
curve  to  be  approximately  maintained  throughout  at  any 


302 


STATISTICAL  METHODS 


desired  slope  by  the  mere  choice  of  dimensions  for  the  dia- 
gram. Hence  such  curves  can  readily  be  kept  close  to  an 
inclination  of  45°,  with  the  result  that  irregularities  of  di- 
rection are  much  more  easily  noticed  than  if  the  slope  were 

Diagram  X.  —  Annual  Production  of  Tinplate  in  the  United 

States,  1891-1912 


Tom 

I.OOO.OOO 

500.000 
250.000 

100.000 
50.000 
25.000 

10.000 
5.000 
2.500 


1.000 


Logarithmic  Vertical  Scale 

Data  of  Diagram  IX 


1891 


1695 


l<XX)  1905 

Calendar  Yean 


t9ia 


as  steep  or  as  flat  as  in  natural-scale  diagrams  some  parts 
of  the  curve  often  must  be. 

For  the  plotting  of  index-numbers  logarithmic  diagrams 
are  particularly  appropriate,  for  here  the  numbers  them- 
selves are  ratios,  and  their  relative  aspect  is  important. 
If  an  index  number  of  general  prices  should  rise  from  80  to 
100,  and  later  from  100  to   120,  the  two  changes  would 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     303 

appear  of  equal  significance  in  an  ordinary  diagram.  Yet 
the  first  is  an  increase  of  25  per  cent,  the  second,  an  in- 
crease of  but  20  per  cent.  In  their  effects  upon  the  pur- 
chasing power  of  stated  money  incomes  the  two  changes 
are  by  no  means  the  same.     A  logarithmic  diagram  reveals 

Diagram  XI.  —  Course  of  the  General  Index  Number  of 
Wholesale  Prices  Published  by  the  United  States  Bureau 
OF  Labor  Statistics,  1890-1914 


Numbera 


(Average  Prices  for  the  Period  1890-99  are  Taken  as -100) 

Natural  Scale 


lOU 

140 

— 

120 

^ 

y 

k 

^ 

^ 

,  -^ 

^ 

y 

100 

"^ 

' 

N 

, 

/ 

\ 

-- 

/ 

60 

■ 

60 

40 

20 

n 

1690 


1893 


1900 


1905 


1910 


Yea^s 


their  significant  difference.  Diagrams  XI  and  XII  con- 
trast the  natural-scale  method  with  the  logarithmic-scale 
method  in  the  case  of  the  general  index  number  of  whole- 
sale prices  from  1890  to  1914,  published  by  the  United 
States   Bureau   of   Labor   Statistics.     It  will  be  remarked 


304 


STATISTICAL  METHODS 


that  the  logarithmic  figure,  which  does  not  require  a  zero 
base-hne  in  order  to  convey  a  true  sense  of  relative  values, 
permits  a  considerable  saving  of  space.  .  .  . 

From  the  illustrations  which  have  been  offered  it  will 
have  appeared  first  of  all  that  logarithmic  diagrams  present 
ratios  and  relative  changes  as  directly  and  simply  (though 
not,  to  the  uninitiated ,  eye,  so  obviously)  as  natural-scale 
diagrams   present  absolute   differences.     Consequently   the 

Diagram  XII.  —  Course  op  the  General  Index  Number  of 
Wholesale  Prices  Published  by  the  United  States  Bureau 
OF  Labor  Statistics,  1890-1914 


Index 
Numben 


Logarithmic  Vertical  Scale 

Data  of  Diagram  XI 


140 

'■ 

Tj 

— 

130 
120 
110 

y 

k 

^ 



^ 

^ 

^ 

V 

s 

/ 

100 

\ 

--. 

■~«-^ 

^ 

/ 

90 

1890 


1895 


1900 


1903 


1910 


Yon 


logarithmic  method  is  peculiarly  effective  when  the  data 
are  essentially  relative;  when  they  exhibit  a  tendency  to 
increase  or  decrease  at  a  fixed  relative  rate ;  or  when  signifi- 
cant proportionalities  between  different  series  of  data  are 
to  be  demonstrated.  Incidentally  it  serves  to  economize 
space,  and  thus  permits  the  inclusion  of  very  diverse  magni- 
tudes in  the  same  figure.  These  are  real  advantages,  which 
clearly  justify  the  use  of  logarithmic  constructions  in  a 
considerable  range  of  graphic  work  —  sometimes  by  them- 
selves, sometimes  in  conjunction  with  other  forms  of  repre- 
sentation.    How    extensively    such    constructions    will    or 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     305 

should  supplant  ordinary  figures  on  the  natural  scale  need 
not  now  be  argued.  It  is  enough  to  make  known  their 
fundamental  properties.  When  these  are  generally  ap- 
preciated, we  may  trust  the  ingenuity  and  judgment  of 
statisticians  to  find  for  logarithmic  diagrams  the  place 
that  they  deserve. 


306 


STATISTICAL   METHODS 


REVIEW  PROBLEMS 
-     Diagrams  and  Graphs 

1.    From  the  data  in  the  table  below 

Draw  bar  diagrams  comparing  the  foreign  holdings  of  common 
and  preferred  stocks  for  1914  to  1919,  inclusive. 

Distribution    of   Foreign    Holdings    of   the   United    States 
Steel  Corporation  Stock 


Yfar 

Foreign  Holdings 

Common 

Preferred 

Total 

3,646,992 

1,167,325 

1919 
1918 
1917 
1916 
1915 
1914 

368,895 
491,580 
484,190 
502,632 
696,631 
1,193,064 

138,566 
148,225 
140,077 
156,412 
274,588 
309,457 

2.    From  the  data  in  the  table  below 

Draw  two  types  of  component-parts  diagrams  for  holdings  of  com- 
mon stock,  showing  for  1919  the  proportion  held  by  each  country. 

Foreign  Holdings  of  Shares  of  United  States  Steel 
Corporation  Common  Stock,  by  Countries  and  by  Years 


Years 

COTTNTRIES 

Total 

1919 

1918 

1917 

1916 

1915 

1914 

Total      . 

3,680,002 

358,912 

480,163 

476,675 

496,516 

687.177 

1,180,559 

Canada  . 

246,870 

35.686 

45,613 

41,639 

31,662 

38,011 

54,259 

England 

1,769,873 

166,387 

172,453 

173,074 

192,250 

355,088 

710.621 

France    . 

237,424 

28,607 

29,700 

30,059 

34,328 

50,193 

64.537 

Germany 

6,932 

959 

891 

612 

628 

1,178 

2,664 

Holland 

1,398,655 

124,558 

229,285 

229,185 

234,365 

238,617 

342,645 

Ireland  . 

5,833 

160 

19 

19 

914 

1,7.30 

2,991 

Italy  .     . 

1,548 

281 

281 

281 

279 

280 

146 

Spain 

3,939 

555 

549 

300 

510 

800 

1,225 

Sweden  . 

296 

70 

80 

64 

68 

13 

1 

Switzerland 

8,632 

1,649 

1,292 

1,442 

1,512 

1,267 

1.470 

DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION     307 

3.  Using  the  following  data  showing  the  number  and  dead- 
weight tonnage  of  vessels  employed  in  the  West  Indian  and  South 
American  West  Coast  trades,  graphically  compare,  by  using  per- 
centages, frequency  graphs  of 

(1)  The  number  of  vessels  engaged. 

(2)  The  tonnage  of  vessels  engaged. 

(3)  Express  the  comparison  in  some  other  satisfactory  form. 

Table  Showing  the  Distribution  of  Steam  FREUiHTERs 
Classified  by  Size,  Trading  between  the  United  States 
AND  the  West  Indies  and  with  the  West  Coast  of 
South  America 


Vessels  trading  between  the  United  States  and 

Classified  Dwt. 

Tonnage  of 

West  Indies 

South  American  West  Coast 

Vessels 

Number 

Aggregate  Dwt. 
Tona 

Number 

Aggregate  Dwt. 
Tons 

Total  .... 

99 

281,900 

75 

434,941 

500-  1,500 

14 

17,745 

3 

3,579 

1,500-  2,500 

40 

79,593 

5 

10,367 

2,500-  3,500 

20 

57,530 

4 

11,100 

3,500-  4,500 

11 

42,674 

8 

31,300 

4,500-  5,500 

6 

29,453 

9 

43,536 

5,500-  6,500 

6 

35,755 

17 

102,995 

6,500-  7,500 

10 

68,538 

7,500-  8,500 

1 

7,850 

11 

87,148 

8,500-  9,500 

5 

45,888 

9,500-10,500 

2 

16,140 

10,500-11,500 

1 

11,300 

1 

11,350 

4.    Using    the    following    data    showing    the    Immigrant    Aliens 
admitted  into  the  United  States 

(1)  Construct  a  cumulative  historigram  on  an  "up  to  and  in- 
cluding" basis  for  the  period  in  question. 

(2)  Determine  from  the  graph  the  number  of  immigrants  that 
came  into  the  United  States  during  the  first  quarter  of  the  period, 


308 


STATISTICAL   METHODS 


the  first  half  of  the  period,  the  first  three-quarters  of  the  period. 
Similarly,  (l»'((>riiiiiie  the  i)n)p()rii()ii  of  the  entire  time  required  to 
hviu^  u\  oii(>-qu:irti'r  of  tiie  total  number,  one-half  of  the  total 
number,  three-quarters  of  the  total  nuinber.  Arrange  these 
measures  in  tlu"  foi  in  of  a  statistical  table,  and  brieflj'^  describe  them.' 


Month 


Jamiary  . 
February 

March     .  . 

Ai)ril  .     .  . 

May    .     .  . 

June   .     .  . 

July    .     .  . 
Aufjust    . 

September  . 
October  . 

November  . 

December  . 


I!ll  I 


GO,.-i77 
37,700 
20,143 

3(),n() 

2().'2<),S 
2(),<)44 


i'.iir> 

15,481 
13,873 
19,2()3 
24,r)32 
2(),(Ki<) 
22,r>98 
21,504 
21,949 
24,513 
25,450 
24,545 
18,901 


llMli 


17,293 
24,740 
27,5S() 
30,5<)0 
31,021 
30,7()4 
25,035 
29,975 
3(),398 
37,050 
34,437 
30,902 


11117 


24,745 

19,238 

15,512 

20,523 

10,497 

11,085 

9,307 

10,047 

9,228 

9,284 

(),440 

0,987 


1918 


0,250 

7,388 

0,510 

9,541 

15,217 

14,247 

7,780 

7,8<)2 

9,997 

11,771 

8,499 


5.    Usin^  t]u>  following;  data. 

(1)  Draw  an  ordinary  historical  chart  comparing  the  foreign 
holdings  of  common  and  preferred  shares  of  stock  of  the  United 
States  Steel  Corporation. 

In  which  has  the  d(>creast>  betMi  more  marked?  Can  this  ques- 
tion be  ansvv(ir(>d  from  this  lyix'  of  cliart';'      Wliy? 

(2)  Draw  a  "ratio"  chart  using  ordinary  "ratio"  paper.' 

In  whic^h  type  of  shares  has  the  rate  of  decrease  been  most 
inarked?  Compare  charts  (1)  and  (2).  Whicli  type  seems  best 
suited  to  illustrate  the  change  in  holdings?      Wliy? 

'  "Ratio"  paper  may  ho  socnrod  from  the  l<^,(lu('a1i<)iial  Exhitntion  Com- 
pany, 2(')  Cvistoni  Hoviso  St.,  Providciu'C,  U.  1.;  KculTol  and  lOssor  Co., 
127  Fulton  St.,  Now  York  (^ity ;  and  the  Standard  Graph  Co.,  32  Union 
Square,  New  York  City. 


DIACJItAMMATK;   AND   (JRAI'JIU;   I'ltKSEMTATrON      309 

KOUEIUN    JlOLDINOS    OF    iSHAKKS    OK    U.    S.    StEKL    CJcilti'OK A'IION 


( 

OMMON 

]'hi;feiuikii 

Diito 

yiiarc'H 

JVr 
Cejjt 

25.29 

Mar.  :'.l,  191  1 

Sli.-iri'H 

312,311 

J'.r 
(Jeut 

Mar. 

il.  1 

914 

1 ,285,030 

8.(57 

June 

10,  1 

914 

1,274,247 

25.07 

Juno  30,  1914 

312,832 

8.(58 

Dec. 

il,  1 

9H 

1,19:{,004 

23.47 

Doc.  31,  1914 

:'>09,457 

8.59 

Mar. 

11,  1 

91.5 

l,i;{0,209 

22.23 

Mar.  31,  1915 

:{08,()05 

8.55 

.Iiiik; 

'.0,  1 

915 

957,-587 

18.84 

Juno  30,  1915 

;io:{,o7o 

8.41 

iSoi>l. 

}(),  1 

915 

82(),8:{3 

10.27 

S(!pt.  .30,  1915 

297,(591 

8.20 

Doc. 

il,  1 

915 

090,031 

13.70 

Doc.  31,  1915 

274, .588 

7.02 

Mar. 

il,  1 

910 

034,4(59 

12.48 

Mar.  31,  191(5 

2(52,091 

7.27 

Sej)t. 

{(),  1 

910 

537,809 

10..58 

S(!()t.  30,  191(5 

171,09(5 

4.75 

Dec. 

il,  I 

910 

502,();{2 

9.89 

D<!c.  31,  1910 

1.5(5,412 

4.34 

Mar. 

i\,  1 

917 

494,.3;{8 

9.72 

Mar.  31,  1!)17 

151,7.57 

4.21 

Juno 

{(),  I 

917 

481,342 

9.45 

Juno  30,  1917 

142,220 

3.94 

Soj)t. 

{0,  1 

!)17 

477,109 

9.39 

Sopt.  30,  1917 

140,0:',9 

3.59 

Dec. 

n,  1 

9J7 

484,190 

9.52 

Doc.  31,  1917 

140,077 

3.88 

Mar. 

ii,  1 

918 

485, 70(5 

9.50 

Mar.  31,  1918 

140,198 

3.90 

Juno 

iO,  1 

918 

491,4()4 

9.(50 

Juno  .30,  1918 

149,032 

4.13 

Sept. 

iO,  1 

918 

495,009 

9.73 

S(^pt.  30,  1918 

147,845 

1.10 

Dec. 

ii,  1 

918 

491, .580 

9.(58 

Dec.  31,  1918 

148,225 

4.11 

Mar. 

ii,  1 

919 

4  93,. 552 

9.71 

Mar.  31,  1919 

149,8.32 

4.10 

June 

«),  1 

919 

405,434 

9.15 

Jun(!  30,  1919 

14(5,478 

4.07 

Sept. 

}(),  1 

919 

394,543 

7.7(5 

S(!pt.  30,  1919 

143,840 

3.99 

Dec. 

u,  ] 

919 

3(58,895 

7.2(5 

1)(<-.  3,1,  1919 

1.38,.5(50 

3.84 

310 


STATISTICAL   METHODS 


•sa-sts 


"SKg'-rs5-':5a''"'2RR"'-Sft'*  =  tS-"  =  S5 


4  7584 
4  )S8I 

4  7S7« 

4.7s;s 

4.7S68 
4.6IS0 
46700 
4667S 
4  6S7S 
4  6SO0 
4  647S 
46412 
4,6!H 
4  6100 
4.S%2 
4  S800 
4.S0O0 
4.<I2S 
4.3SS0 
.4JS00 
4  3175 
4  26S0 
4  i67i 
4  2000 
4  lOTS 
4  1862 
4  1700 
4  I72S 
4  I62i 
4.1537 
4  I47S 
4  1450 
4  I32S 
4.1012 
4.0400 
4.0025 
3  852S 
3.817S 
3  787S 
i.b«25 


■i 


^"zi 


'A 


t: 


{J 


•\ 


Muumm  Rau  M.TSSS  Ju.  iih.  I')!') 
Muuoalti  Rale  >3662i  Dn:.  I3|K  1919 


?^; 


ENGLAND 

Pounds  Sterling 

Mill  Par  V>lM=t4£66S 


!} 


h 


ii 


^ 


CkMlaa 
Qaolatiow 

OoUtn 
p«r  Poaod 

4  7585 
4  7581 
4.7578 
47575 
4  7568 
4tS50 
4  6700 
4.6671 
4.6525 
4  6500 
4.6475 
4  64U 
4.6J25 
4.6I0O 
4  5%2 
4.5800 
4.5000 
4  4825 
4  3550 

4  ]50n 

4  3175 
4  2650 
4  2625 
4  20<» 

4  1»75 
4  11162 
4  1700 
4  1725 
4  1625 
4  1537 
4  1475 
4  1450 
4  1325 
4  1012 
40400 
40025 
3  8525 
J  8175 
J  7875 
3  6825 


" S*^  * 2Sf  "* 


4  ^  ««s(   9  <0  •■ 


■J»  *  •   —  --  p 


Diagram  I. 


•;  4i|  -?|5  TiSt 

Purports  to  show  the  Trend  of  Sterling  Exchange,  1919  * 


6.    Using  the  above  chart 
July.l.- Average  Price  ~1SI?o"i^  Jan.  2, 


1914 


1920 


1914 


Jan.  2 

1920 

War  Record  of  Bond  Prices 


Diagram  II. 
7.    Write  a  criticism  of  the  above  chart, 


(1)  Write  a  criticism 
of  the  method  in  which 
this  curve  is  drawn, 

(2)  Redraw  the  curve 
according  to  the  direc- 
tions in  the  Text  and  the 
Readings.  What  differ- 
ences do  you  note? 

(3)  Write  a  descrip- 
tion of  the  trend  of  Ster- 
ling exchange  based  upon 
the  original  chart.  How 
satisfactory  is  it?  Why? 


1  Taken  with  permission  from  "Charts  of  the  Fhictuations  of  Foreign 
Exchange  Rates  for  the  Year  1919,"  The  First  National  Bank  of  Boston. 


DIAGRAMMATIC  AND   GRAPHIC   PRESENTATION     311 


key 


Taxes.tlc. 


Operating 
expenses 


United    States 
19,7  I9f6 


Diagram  III. 


3.SI      51«>  '°- 

Division  of  the  Railway  Dollar  in  October,  1917  and  1916 


8.    Using  the  above  diagram 

(1)  Express  the  relationships  by  using  some  other  form  of  com 
ponent-part  chart.     How  do  you  rank  the  relative  methods  ?     Why  ? 

(2)  Place  the  data  in  a  table.     In  what  ways,  if  at  all,  is  the  table 
a  less  satisfactory  method  of  presenting  the  data? 

(3)  Express  the  relationships  of  the  data  in  the  form  of  a  running 
statement  of  not  more  than  100  words. 


Diagram  IV.  —  A  Hundred  Dollars'  Worth  of  Cotton,  1913  to  1916 

9.    Using  the  above  diagram 

(1)  Study  the  proportions  of  the  figures.     Are  the  figures  drawn 
to  scale  ?     What  is  the  scale  ? 


312 


STATISTICAL   METHODS 


(2)  Redraw  this  diagram  according  to  the  rules  discussed  in 
the  Text  and  Readings. 


$ 

17 

HOGS 

/ 

lb 
15 
14 
13 
12 
It 
10 

/ 

/ 

CATTLE 

/ 

/ 

/ 

/ 

^ 

9 
8 

^'' 

X 

_^ 

.— *^ 

t 1 

/ 

,• 

.'^ 

^' 

'v^ 

,' 

1 
6 
5 

S«^ 

_il. 

.^-^ 

' — — 

r 

''--. 

,^" 

-^-- 

" 

4 
3 
2 
1 

f 

c 
< 

1 

n         < 

3             < 

n         < 

3           r 

c 

J>          c 

3            ; 

-          < 

[v>          -+        •  lO          i£>           r^ 
7>          a>          oi           o>          oi 

3> 

Diagram  V.  —  The  Average  Yearly  Prices  of  Cattle  and  Hogs  at  Chicago, 

1903  to  1918 


10.    Discuss  the  construction  of  the  above  diagram  in  the  light 
of  the  discussion  in  the  Text  and  Readings. 


DIAGRAMMATIC   AND  GRAPHIC   PRESENTATION      313 


Cents 
per 


ound 
40    i. 

30    _ 

Average  Retail  Seltinp 

Price  ?5  cents  _ 

20    ^ 

*c 

0) 

u 

^ 

1/1 

c 

•If 

o 

w 

JQ 

o 

c 

3 

m 

e 

10 

O 

a 

o 

c 

3 
O 

c 

O 

c 

o 

c 
a> 
(J 

c 
c 

o 

CM 
O 

c 
u 

O 

o 

4; 

c 

(J 

Cm 

O) 

c 
o 

1>   ' 

Oft  1 
<o    1 

c    1 

-c    1 

I/)    ' 

^ 

XI 

3 

3 

trt 

CD 

cS 

lO 

(T 

O 

q: 

5 

a: 

1! 

Pounds 


53       47 


51 


97 


131 


86 


80 


63      8 


Percent       » 
Total  Wgl.  /     8.6 

Diagram  VI.  - 


7.6       8.4         15  7  21.2  \\  J3  10  2    1.3 

•  Weights  and  Retail  Prices  of  the  Different  Cuts  of  Meat 


11.    Using  the  above  diagram 

(1)  Describe  the  principles  on  which  it  is  drawn. 

(2)  Write  a  paragraph  descriptive  of  its  contents. 

(3)  Put  in  the  form  of  a  table  the  data  shown  in  the  diagram. 

(4)  Which  is  the  most  effective,  the  description,  the  diagram,  or 
the  table?     Why? 


314 


STATISTICAL   METHODS 


i  1. 072,000,000 


S 173, 837, 000 


♦23,064,000 
$5,769,000 

1899  190^ 


1917 
I.  740,792 


1899  1904     1909  1914  19 17 

Diagram  VII.  —  Growth  of  the  Automobile  Industry  and  the  Investment 

behind  it,  1S99  to  1917 

12.    Criticize  the  form  of  these  diagrams,  according  to  the  stand- 
ards established  in  the  Text  and  Readings. 


DIAGRAMMATIC  AND  GRAPHIC  PRESENTATION      315 


19( 

30  19 

oai9 

04  IS 

oe  IS 

K)8  1910   1912    IS 

>14  19 

16    19 

181£ 

H91S 

«ois 

iiy 

n't 

112 

(iO 

i09 

ik 

108 
107 

/ 

\. 

/ 

\ 

105 

/ 

\ 

/ 

\ 

4. 

/ 

^f    1 

fV 

/ 

fl  ^ 

101 

f 

I 

k 

9 

\ 

99 
98 

a 

\, 

S 

\ 

1       I 

\ 

I 

i 

^ 

/ 

§5 

1     if 

^         iP 

v/ 

94 
93 
92 

^        fi 

V 

\ 

\ 

1 

yi 

\ 

\ 

\ 

87 
8b 

\ 

* 

\ 

% 

ii 

79 
]^7 

1 

4 

^ 

'i 

7 

1 

1 

I 

\ 

\ 

\ 

1 

\ 

1 

76 
75 
74 

\ 

1 

\_ 

1 

If 

f 

73 

71 

-_ 

DiAGBAM  VIII.  —  Course  of  Average  Price  of  15  Standard  Long  Term  Rail- 
road Bonds  During  Past  Twenty  Years 


13.    Using  the  above  diagram 

(1)  By  what  standards  would  you  test  its  merits? 

(2)  Write  a  description  of  the  trend  of  Railroad  Bonds  based 
upon  this  chart.     How  satisf a-ctory  is  it  ?     Why  ? 


316 


STATISTICAL   METHODS 


Diagram  IX.  —  Location  of  Share-rented  Farms  which  include  Stock-share 

Rented  Farms 


Diagram  X.  —  Location  of  Cash-rented  Farms 


DIAGRAMMATIC   AND   GRAPHIC  PRESENTATION      317 

14.    Using  diagrams  IX  and  X, 

(1)  Criticize  the  methods  by  which  they  are  drawn. 

(2)  In  what  way,  if  at  all,  would  you  criticize  the  titles?  Be 
specific. 

(3)  Extract,  from  the  maps,  the  concrete  data,  place  them  in  a 
statistical  table,  and  compare  the  data.  How  does  the  tabular 
method  of  presentation  compare  with  the  graphic? 

(4)  Secure,  if  possible,  countir  outline  maps  of  Iowa,  and  redraw 
the  illustrations  according  to  the  instructions  in  the  Text. 

(5)  Would  it  be  possible  or  advantageous  to  use,  in  this  case,  any 
one  of  the  type  of  dot  maps  described  in  the  Text  ?     Try  one  type. 


CHAPTER   VII 

AVERAGES  AS  TYPES 

The  Use  of  Averages  in  Presenting  Wage 
Statistics  ^ 

There  are  two  methods  of  presenting  wage  statistics : 
(1)  Computation  of  an  average;  (2)  classification  into 
groups.  Each  of  these  methods  find  frequent  illustration 
in  the  current  literature  of  wage  statistics. 

1.  The  Average.  —  In  many  instances  the  only  method 
possible  is  that  of  the  average,  as  when  the  data  returned 
include  only  the  gross  amount  paid  to  a  given  number  of 
workmen.  In  such  a  case  if  a  presentation  of  the  wages  of 
the  individual  be  desired,  the  only  available  term  is  an  aver- 
age obtained  by  dividing  the  total  paid  in  Avages  by  the 
number  of  employees.  Such  a  statistical  expression  is  often 
valid  and  instructive,  as  when  the  units  in  the  data  accumu- 
lated are  more  or  less  uniform  in  character  and  the  range  of 
variation  is  not  excessive.  At  an  earher  period  when  there 
was  greater  equalitj^  in  social  and  economic  conditions,  less 
division  of  labor,  and  less  variety  in  industry,  the  average 
was  relatively  a  serviceable  statistical  term;  but  with  the 
development  of  modern  economic  conditions,  characterized 
by  the  greatest  range  between  skilled  and  unskilled  labor, 
by  many  grades  of  hand  and  machine  labor,  and  by  a  multi- 
plication of  occupations,  the  average  has  become  frequently 
misleading.     The  advantage  of  the  average  is  the  ease  with 

1  Adapted  with  permission  from  "Employees  and  Wages,"  Twelfth  Census 
of  the  United  States  Taken  in  the  Year  1900,  1903.  Davis  R.  Dewey,  "Re- 
port," pp.  xxiv-xxviii,  Sec.  VIII. 

318 


AVERAGES  AS  TYPES  319 

which  it  can  be  used  for  formulating  a  statistical  proposition 
in  a  single  number ;  it  is  doubtful,  however,  whether  indus- 
trial phenomena  so  complex  as  wages  can  be  satisfactorily 
reduced  to  a  single  term.  Human  labor  varies  greatly  in  its 
form,  depending  for  its  effectiveness  u})on  individual  skill, 
intelligence,  and  energy,  as  well  as  upon  opportunities  for 
employment.  As  a  result  of  these  variations,  rewards  differ 
greatly.  Although  the  economic  force  of  competition  exerts 
a  powerful  influence  towai'd  uniformity  of  compensation 
for  a  given  unit  of  individual  exertion  as  applied  in  the  manu- 
facture of  products  requiring  the  same  skill  and  intelligence, 
yet  differences  constantly  appear ;  and,  as  shown  by  the 
tables  in  this  report,  these  differences  are  found  not  only 
within  a  well-defined  occupation  in  a  single  section  of  the 
country,  but  even  within  the  same  occupation  as  reported  by 
a  single  establishment.  Some  workmen  receive  high  wages, 
some  medium  wages,  and  some  low  wages ;  the  result  is  a 
composite  picture,  each  element  of  which  possesses  an  in- 
dividual interest  which  should  not  be  lost  sight  of.  The 
student  of  social  conditions  is  interested  to  know  to  how  large 
a  part  of  the  social  mass  certain  characteristics,  qualities, 
or  phenomena  are  ap}:)licable ;  and  particularly  is  this  true 
in  the  study  of  the  condition  of  labor  and  its  reward.  It  is 
far  more  important  to  know  that  one-half  of  the  laboring 
class  receive  wages  between  -SI. 25  and  $1.75  per  day,  than 
to  know  that  the  average  of  the  total  is  $1.50.  The  average 
disregards  the  significance  of  the  parts  and  aims  to  give 
expression  to  the  whole  in  a  single  term. 

2.  Classification  into  Wage  Groups.  —  Since  variations  in 
wages  lose  much  of  their  meaning  when  merged  into  a  single 
term,  the  treatment  of  wage  statistics  should  as  far  as  possible 
be  descriptive,  and  this  is  statistically  accomplished  by  the 
method  of  classifying  wages  into  gi'oups,  as  was  done,  for 


320  STATISTICAL   METHODS 

example,  for  certain  industries  in  the  Eleventh  Census.  It 
must  be  admitted  that  this  method  is  not  so  simple  as  that 
of  the  average;  it  is  much  more  difficult  to  compare  two 
lines  at  all  their  points  than  to  select  from  these  lines  two 
single  points  and  compare  them.  For  these  reasons  the 
method  of  analysis  used  in  this  report  for  the  purpose  of 
comparing  wages  in  different  occupations  and  at  two  differ- 
ent periods  is  not  as  simple  as  if  the  average  alone  had  been 
used.  This,  however,  should  not  be  regarded  as  a  defect; 
statistical  art  has  its  limitations;  especially  is  this  so  in 
problems  requiring  descriptive  treatment,  such  as  wages. 

An  example  of  the  advantage  of  the  classification  of  wages 
into  groups  over  the  gross  average  is  seen  in  the  following 
illustration,  dra^vn  from  one  of  the  pay  rolls  reported.  In 
this  establishment  there  were  92  employees  in  1890  and  299 
in  1900.  If  a  general  average  be  desired  for  all  the  employees 
at  each  of  these  periods,  the  results  are  an  average  wage  of 
19  cents  per  hour  in  1890  and  17  cents  per  hour  in  1900,  mak- 
ing a  reduction  of  20  cents  per  day  of  10  hours.^ 

The  real  difference  between  1890  and  1900  is,  however, 
better  disclosed  in  the  following  table,  which  classifies  the 
numbers  under  several  rates  of  wages  and  also  reduces  these 
numbers  to  percentages  of  the  respective  totals  for  1890  and 
1900. 

From  this  it  will  be  observed  that  there  is  a  much  larger 
amount  of  low-priced  labor  in  1900  than  in  1890.  Does 
this  mean  a  reduction  in  the  wages  of  a  given  class  of  em- 
ployees, as  "machinists,"  for  example?  The  misleading 
character  of  a  gross  average  applied  to  an  industry  group, 
as  well  as  the  great  superiority  of  a  presentation  by  wage 
groups  such  as  those  in  the  above  table,  is  disclosed  as  soon 


1  In  computing  these  averages,  the  lowest  wage  in  each  wage  group 
taken  as  the  exact  wage  for  each  individual  in  the  group. 


was 


AVERAGES  AS  TYPES 


321 


All  e7nployees 


R.VTES  Per  Houk  (Cents) 

1900 

1890 

Number 

Per  Cent 

Number 

Per  Cent 

Total         

299 

100.0 

92 

100.0 

5  to  9 

10  to  14 

15  to  19  

20  to  24 

25  to  29 

30  to  34 

35  to  39 

40  and  over 

52 
59 
56 
47 
61 
12 
7 
5 

17.4 

19.7 

18.7 

15.7 

20.4 

4.0 

2.4 

1.7 

13 
3 

16 

28 

22 

7 

2 

1 

14.1 

3.3 

17.4 

30.4 

23.9 

7.6 

2.2 

1.1 

as  an  analysis  is  made  of  the  several  classes  of  occupations 
which  go  to  make  up  the  total.  Take,  for  example,  the 
"machinists,"  of  whom  52  were  returned  in  1890  and  74  in 
1900.  The  distribution  of  "machinists"  accorcUng  to  wage 
groups  is  shown  in  the  following  table : 

Machinists 


1900 

1890 

Rates  Per  Hour  (Cents) 

Number 

Per  Cent 

Number 

Per  Cent 

Total 

74 

100.0 

52 

100.0 

15  to  19 

5 

9.6 

20  to  24 

10 

13.5 

19 

36.5 

25  to  29 

47 

63.5 

20 

38.5 

30  to  34 

9 

12.2 

6 

11.6 

35  to  39 

6 

8.1 

1 

1.9 

40  and  over 

2 

2.7 

1 

1.9 

322 


STATISTICAL   METHODS 


Obviously  the  cause  of  the  apparent  reduction  of  wages 
for  all  employees  is  the  employment  in  1900  of  a  relatively 
larger  number  of  low-priced  employees  than  in  1890,  prob- 
ably due  to  the  introduction  of  improved  machinery,  which 
gives  a  much  larger  output  per  machine,  but  which  demands 
a  considerable  amount  of  unskilled  labor  to  handle,  erect, 
assemble,  pack,  and  ship. 

Another  illustration  may  be  found  in  an  establishment 
manufacturing  fine  glazed  kid.  In  1890  there  were  55  em- 
ployees, all  men,  and  in  1900,  70,  of  whom  12  were  women. 
The  difference  in  the  wages  received  by  males  is  shown  in 
the  follouang  table : 


Males  in 

Glazed-kid  Factory 

1900 

1890 

Rates  Per  Week  (Dollaks) 

Number 

Per  Cent 

Number 

Per  Cent 

Total 

58 

100.0 

55 

100.0 

20  and  over 

2 

3.5 

1 

1.8 

15  to  20 

2 

3.5 

3 

5.4 

12  to  15  

6 

10.3 

14 

25.5 

10  to  12 

6 

10.3 

15 

27.3 

9  to  10  

3 

5.2 

4 

7.3 

8  to  9 

17 

29.3 

5 

9.1 

7  to  8 

14 

24.1 

3 

5.4 

6  to  7 

4 

6.9 

4 

7.3 

5  to  6 

1 

1.7 

4 

7.3 

4  to  5 

3 

5.2 

2 

3.6 

It  will  be  observed  that  there  is  a  marked  reduction  in 
the  higher-priced  labor.  This  is  due  to  changes  which  have 
taken  place  during  the  past  decade  in  the  manufacture  of 
leather.     For  example,  the  reduction  in  the  number  of  "beam- 


AVERAGES  AS  TYPES 


323 


sters"  —  skilled  workmen  who  remove  the  superfluous  flesh 
from  the  hides  with  a  sHcking  machine  —  is  a  result  of  the 
introduction  of  machinery  which  permits  the  employment  of 
a  greater  proportion  of  unskilled  labor.  Moreover,  the 
manner  of  coloring  has  been  changed  from  table  coloring  to 
box  coloring ;  by  the  former  method  the  color  was  put  on  with 
a  brush,  whereas  now  the  skins  are  dipped  into  a  box  of  color- 
ing liquid.  An  analysis  of  the  wages  of  the  "  beamsters  "  and 
the  "colormen"  does  not  show  any  reduction  in  the  wages 
for  the  first  class. 


Beamsters 

COLOBMEN 

Rates  per  Week 
(Dollars) 

Number 

Per  cent 

Number 

Per  cent 

1900 



5 

1890 

1900 

1890 

1900 

1890 

1900 

1890 

Total .     .     . 

10 

100.0 

100.0 

3 

9 

100.0 

100.0 

19.00-to  19.49 
15.00  to  15.49 
13.00  to  13.49 
12.50  to  12.99 
12.00  to  12.49 
11.00  to  11.49 
10.00  to  10.49 
9.00  to  9.49 

4 
1 

1 
2 
7 

80.0 
20.0 

10.0 
20.0 
70.0 

1 

1 
1 

1 

3 
4 
1 

33.3 
33.3 
33.4 

11.1 

33.3 
44.5 
11.1 

3.  Cumulative  Percentage.  —  There  is  one  practical  defect 
in  classified  rates  which  often  impairs  their  usefulness.  This 
lies  in  the  difficulty  of  comparing  two  given  sets  of  returns  so 
as  to  ascertain  what  differences  may  exist  or  what  changes 
may  have  taken  place  ;  even  if  the  figures  in  a  classified  group 
table  be  reduced  to  percentages,  the  real  differences  between 
the  two  sets  of  figures  are  not  always  .easily  recognized.  For 
this  reason  the  cumulative  percentage  has  been  used  in  all 


324 


STATISTICAL   METHODS 


Rates  per  Week 

Actual  Number 
AT  Rate 
Specified 

Percentage 

IN  THE 

Group 

Cumulative 
Percentage 

Median  and 

quabtile 

Groups 

(Dollars) 

1900 

1890 

1900 

1890 

1900 

1890 

1900 

1890 

Total       .     . 

759 

572 

100.0 

100.0 

3.50  to    3.99 

7 

5 

0.9 

0.9 

100.0 

100.0 

4.00  to    4.49 

10 

7 

1.3 

1.2 

99.1 

99.1 

4.50  to    4.99 

23 

15 

3.1 

2.6 

97.8 

97.9 

5.00  to    5.49 

31 

9 

4.1 

1.6 

94.7 

95.3 

6.50  to    5.99 

12 

3 

1.6 

0.5 

90.6 

93.7 

6.00  to    6.49 

53 

40 

7.0 

7.0 

89.0 

93.2 

6.50  to    6.99 

7 

3 

0.9 

0.5 

82.0 

86.2 

7.00  to    7.49 

22 

6 

2.9 

1.1 

81.1 

85.7 

7.50  to    7.99 

46 

37 

6.1 

6.5 

78.2 

84.6 

Q 

8.00  to    8.49 

5 

5 

0.6 

0.9 

72.1 

78.1 

8.50  to    8.99 

1 

2 

0.1 

0.3 

71.5 

77.2 

9.00  to    9.49 

92 

42 

12.2 

7.3 

71.4 

76.9 

Q 

9.50  to    9.99 

22 

6 

2.9 

1.1 

59.2 

69.6 

10.00  to  10.49 

24 

30 

3.2 

5.2 

56.3 

68.5 

10.50  to  10.99 

60 

45 

7.9 

7.9 

53.1 

63.3 

M 

11.00  to  11.49 

25 

31 

3.3 

5.4 

45.2 

55.4 

11.50  to  11.99 

1 

5 

0.1 

0.9 

41.9 

50.0 

M 

12.00  to  12.49 

100 

61 

13.2 

10.7 

41.8 

49.1 

12.50  to  12.99 

2 

3 

0.3 

0.5 

28.6 

38.4 

13.00  to  13.49 

3 

1 

0.4 

0.2 

28.3 

37.9 

13.50  to  13.99 

75 

62 

9.9 

10.8 

27.9 

37.7 

Q 

14.00  to  14.49 

7 

4 

0.9 

0.7 

18.0 

26.9 

14.50  to  14.99 

1 

1 

0.1 

0.2 

17.1 

26.2 

15.00  to  15.49 

62 

72 

8.2 

12.6 

17.0 

26.0 

Q 

15.50  to  15.99 

13 

2 

1.7 

0.3 

8.8 

13.4 

16.00  to  16.49 

1 

1 

0.1 

0.2 

7.1 

13.1 

16.50  to  16.99 

16 

22 

2.1 

3.8 

7.0 

12.9 

17.00  to  17.49 

2 

2 

0.3 

0.3 

4.9 

9.1 

17.50  to  17.99 

1 

1 

0.1 

0.2 

4.6 

8.8 

18.00  to  18.49 

19 

17 

2.5 

3.0 

4.5 

8.6 

18.50  to  18.99 

1 

1 

0.1 

0.2 

2.0 

5.6 

19.00  to  19.49 

1 

1 

0.1 

0.2 

1.9 

5.4 

19.50  to  19.99 

6 

3 

0.8 

0.5 

1.8 

5.2 

20.00  to  20.49 

4 

2 

0.5 

0.3 

1.0 

4.7 

20.50  to  20.99 

1 

0.2 

0.5 

4.4 

21.00  to  21.49 

3 

6 

0.4 

1.1 

0.5 

4.2 

21.50  to  21.99 

2 

• 

0.3 

0.1 

3.1 

AVERAGES  AS  TYPES 


325 


Rates  per  Week 
(Dollars) 

Actual  Number 
AT  Rate 
Specified 

Percentage 

IN  THE 

Group 

Cumul.\tive 
Percentage 

Median  and 

quartile 

Groups 

1900 

1890 

1900 

1890 

1900 

1890 

1900 

1890 

22.00  to  22.49 
22.50  to  22.99 
23.00  to  23.49 
23.50  to  23.99 
24.00  to  24.49 
24.50  to  24.99 
25.00  to  25.49 

- 

1 

1 
4 
4 
1 
3 

3 

0.1 

0.2 
0.7 
0.7 
0.2 
0.5 

0.5 

0.1 
0.1 
0.1 
0.1 
0.1 
0.1 
0.1 

2.8 
2.6 
1.9 
1.2 
1.0 
0.5 
0.5 

the  detailed  tables.  The  figures  in  the  cumulative  percentage 
column  represent  the  proportion  of  the  total  number  of  per- 
sons in  the  given  table  receiving  a  wage  as  great  as,  or  greater 
than,  the  lowest  wage  of  the  given  wage  group.  The  table 
above  shows  the  advantages  of  this  method  of  presentation, 
and  also  the  method  of  interpretation. 

From  this  table  it  is  possible  to  determine  how  large  a 
proportion  of  the  total  number  of  employees  is  receiving  as 
much  as,  or  more  than,  a  given  wage.  For  example,  the 
columns  headed  "cumulative  percentage"  show  that  in 
1900  the  proportion  of  the  total  number  receiving  $8  or  more 
per  week  was  72.1  per  cent,  while  in  1890  it  was  78.1  per  cent ; 
at  $10  the  respective  proportions  were  56.3  and  68.5  per  cent ; 
and  at  $15  they  were  17  and  26  per  cent.  From  the  columns 
of  cumulative  percentages  it  is  evident  that  wages  were  higher 
in  1890  than  in  1900,  a  fact  clearly  disclosed  neither  by  the 
numbers  nor  by  the  percentages  in  the  respective  groups. 

4.  Median  and  Quartiles.  —  The  use  of  the  column  of  cu- 
mulative percentages  makes  it  easy  to  determine  the  range 
of  wages  for  any  given  proportion  of  the  working  force ;  by 
this  means  also  it  is  possible  to  indicate  the  wage  group  of 


326  STATISTICAL   METHODS' 

the  employee  who  stands  half-way  between  the  lowest-paid 
and  the  highest-paid  employee  in  the  class  under  considera- 
tion. For  example,  in  the  above  table,  it  is  seen  that  when 
the  employees  in  1900  are  arranged  in  a  sequence  according 
to  their  rates  of  pay,  beginning  with  the  lowest  rate  and  pro- 
ceeding upward,  the  wage  of  the  three  hundred  and  eightieth 
or  middle  employee  lies  between  $10.50  and  $11.00.  The 
middle  term  in  a  series  of  this  character  is  called  the  "me- 
dian." By  the  use  of  the  median,  employees  at  excep- 
tional rates  either  low  or  high,  are  not  given  an  undue  weight 
or  importance  as  they  are  when  the  average  is  used.  Fre- 
quently, however,  the  median  will  not  vary  greatly  from  the 
average ;  in  the  foregoing  table,  for  example,  the  average  in 
1900  is  $10.55,  and  in  1890,  $11.63.i  .  .  . 

Another  advantage  of  the  cumulative  percentage  lies  in 
the  facility  in  shoAving  the  wages  of  the  employees  who  stand 
at  selected  points  along  the  whole  series  of  employees,  as, 
for  example,  at  one-quarter  and  three-quarters  up  the  ascend- 
ing scale.  The  terms  at  these  particular  points  are  called 
•'quartiles,"  and  within  these  two  limits  would  clearly  fall 
the  wages  of  at  least  one-half  of  the  working  force.  Thus, 
it  will  be  seen  that  in  1900  the  wages  of  the  employee  who 
stands  one  quarter  of  the  Avay  up  the  scale  lie  in  the  wage 
group  $7.50  to  $7.99;  and  in  1890,  in  the  group,  $9.00  to 
$9.49.  The  wages  of  the  employee  standing  three-quarters 
of  the  way  up  the  scale  Ue  in  the  wage  group  $13.50  to  $13.99 
in  1900,  and  in  the  group  $15.00  to  $15.49  in  1890.  It  is  evi- 
dent, then,  that  the  wages  of  what  may  be  termed  the  middle 
half  of  the  employees  were  between  $7.50  and  $13.99  in  1900, 
and  between  $9.00  and  $15.49  in  1890.  Such  a  statement, 
however,  does  not  preclude  the  possibility  that  more  than 

^  In  computing  these  averages,  the  lowest  wages  in  each  wage  group  was 
taken  as  the  exact  wage  for  each  individual  in  the  group. 


AVERAGES  AS   TYPES  327 

one-half  of  the  employees  receive  wages  between  the  two 
limits  named ;  it  is  entirely  possible  that  60,  70,  or  a  greater, 
per  cent  of  the  working  force  receive  wages  within  these 
limits.  The  method  does,  however,  justify  the  statement 
that  at  least  one-half  receive  the  wages  stated ;  there  may 
be  more,  but  there  cannot  be  less. 

5.  Limitations  in  the  Use  of  the  Median  and  Quariiles.  — 
The  limitations  in  the  use  of  the  median  and  quartiles  are 
of  so  important  a  character  that  they  deserve  special  mention. 
The  use  of  the  median  for  the  comparison  of  two  series  of 
wages  is  subject  to  the  following  drawbacks  :  The  wage  scale 
may  be  so  precise  that  the  tables  present  data  in  scattered 
groups  rather  than  in  even  distribution  throughout  the  series  ; 
then  since  the  median  can  never  fall  in  any  group  not  repre- 
sented by  actual  returns,  the  change  of  a  few  individuals  may 
cause  a  wide  shifting  of  the  position  of  the  median.  Or, 
the  groups  containing  relatively  large  numbers  may  be  at  a 
distance  from  the  median  group,  while  the  group  containing 
the  median  and  the  groups  near  to  it  may  represent  only  a 
few  persons  ;  in  that  case  also  the  change  of  a  few  individuals 
about  the  median  rates  may  appear  unduly  significant.  The 
shifting  of  a  comparatively  small  number  of  persons  upward 
or  downward  across  the  median  point  may  thus  cause  the 
position  of  the  median  group  to  change  in  a  marked  degree. 
On  the  other  hand  the  shifting  through  a  considerable  dis- 
tance of  comparatively  large  numbers  of  persons  vaW  not 
affect  the  position  of  the  median,  provided  the  median  point 
is  not  crossed.     This  is  illustrated  by  the  table  on  page  328. 

It  will  be  noted  that  at  both  periods  there  was  a  combined 
total  of  four  persons  in  groups  $7.00  to  $7.49  and  $7.50  to 
$7.99,  while  the  numl^er  of  persons  both  above  and  below 
these  two  groups  remained  the  same  (48) ;  and  that  while  the 
median  group  was  $7.00  to  $7.49  in  1890,  the  shifting  of  one 


328 


STATISTICAL   METHODS 


Rates  Per  Week 
(Dollars) 

Actual 
Number 

Cumulative 
Percentage 

Position  of 
Median 

AND   QuARTILES 

1900 

1890 

1900 

1890 

1900 

1830 

Total 

100 

100 

5.00  to  5.49 

5.50  to  5.99 

6.00  to  6.49 

6.50  to  6.99 

7.00  to  7.49 

7.50  to  7.99 

8.00  to  8.49 

8.50  to  8.99 

9.00  to  9.49 

.30 

10 

6 

2 

2 

2 

29 

10 

9 

6 

10 

30 

2 

3 

1 

9 

10 

29 

100 
70 
60 
54 
52 
50 
48 
19 
9 

100 
94 
84 
54 
52 
49 
48 
39 
29 

Q 

M 

Q 

Q 
M 

Q 

person  upward  in  the  scale  made  $7.50  to  $7.99  the  median 
group  in  1900.  Yet,  although  the  median  advanced  a  50- 
cent  group,  a  heavy  fall  actually  took  place  in  the  wages  of 
the  majority  of  the  persons  shown  in  the  table.  The  median 
group  would  not  have  changed  but  for  the  shifting  of  one 
person  from  group  $7.00  to  $7.49  to  group  $7.50  to  $7.99. 
If,  instead  of  the  shifting  of  one  of  the  four  persons  shown 
at  each  period  in  groups  $7.00  to  $7.49  and  $7.50  to  $7.99,  the 
numbers  in  each  of  these  groups  had  remained  the  same  at 
both  periods,  the  median  group  would  not  have  changed. 
The  median  is  changed  only  by  a  transfer  of  employees  from 
rates  above  the  median  group  to  rates  below  it,  or  vice  versa. 

The  above  mentioned  defects  in  the  use  of  the  median 
alone  are  inherent  also  in  the  use  of  a  single  quartile,  and  to 
some  extent  in  the  use  of  quartiles  in  pairs.  The  data  at 
the  ends  of  a  scale  of  wage  rates  are  more  likely  to  be  con- 
centrated into  isolated  groups  than  those  near  the  center, 

6.  Medians  with  Quartiles.  —  The  presentation,  however, 
of  the  median  group  and  the  quartile  groups  together,  shows 


AVERAGES  AS  TYPES  329 

the  change  in  wages  at  three  equidistant  points  on  the  scale, 
and  will  as  a  rule  show  concisely  what  the  general  course  of 
wages  has  been.  Thus,  in  the  foregoing  hypothetical  example, 
while  the  use  of  the  median  group  alone  would  have  been 
misleading,  a  consideration  of  the  median  in  connection  with 
the  quartiles  shows  that  the  slight  advance  in  the  median 
group  was  due  to  peculiar  grouping  and  scarcity  of  data  at 
that  point,  and  that  thei'e  was  in  fact  a  considerable  fall  in 
wages  in  the  establishment  during  the  decade.  Data  present- 
ing such  irregularity  of  distribution  will  more  often  be  found 
where  returns  for  two  or  more  widely  distinct  occupations, 
or  different  grades  of  skill  in  the  same  occupation,  are  shown 
in  the  same  table ;  with  such  data,  the  median  and  one 
quartile  will  often  be  in  the  same  group.  Such  a  combina- 
tion might  be  found  in  the  "total"  for  an  industry,  and  this 
possibility  affords  an  additional  reason  for  analyzing  wage 
returns  into  occupations  as  specific  as  possible. 

Weighted  Averages  and  Crop  Reporting  ^ 

The  numerical  method  by  which  the  condition  of  growing 
crops  is  measured  in  Germany  is  simple  in  result,  but  some- 
what complex  in  operation.  In  the  scale  adopted  1  represents 
very  good,  2  good,  3  medium  or  average,  4  poor,  5  very  poor. 
Each  correspondent  attaches  one  or  the  other  of  these  figures 
to  each  of  the  crops  reported  on,  and  the  averages  are  worked 
out  in  the  central  office  for  the  whole  of  Germany.  Corre- 
spondents are  instructed  to  avoid  giving  any  range  as  would 
be  imphed,  for  instance,  by  the  use  of  numbers  1-4,  2-4,  3-5, 
etc. ;  where  closer  estimates  are  desirable  and  possible  they 
are  permitted  to  use  a  decimal  point.     Thus,  if  the  condition 

1  Adapted  with  permission  from  Godfrey,  Ernest  H.,  "Methods  of  Crop 
Reporting  in  Different  Countries,"  Juurnal  of  the  Royal  Statistical  Society, 
Vol.  73,  1910,  pp.  265-266. 


330  STATISTICAL   METHODS 

of  a  crop  is  considered  to  be  midway  between  2  and  3,  it  may 
be  registered  as  2.5,  and  so  on  for  other  gradations. 

Where  there  are  disturbing  factors  which  prevent  the  appli- 
cation of  a  single  figure  to  the  whole  crop  of  a  district,  as, 
for  instance,  where  a  wheat  crop  on  a  large  area  of  clay  soil 
may  be  excellent  while  that  on  another  area  of  sandy  soil 
may  be  the  reverse,  or  where  crops  differ  owing  to  their  cul- 
tivation on  marshlands,  uplands,  etc.,  the  correspondent  is 
instructed  as  to  the  method  he  should  adopt  in  order  to 
arrive  at  a  number  which  fairly  expresses  the  condition  of 
the  crop  for  the  whole  of  his  district.  He  first  estimates 
approximately  the  area  of  the  crop  under  each  different  cate- 
gory, assigns  to  each  the  number  which  properly  expresses  its 
condition,  and  then  works  out  an  average  figure  for  the  crop 
in  the  whole  district.  The  following  is  a  concrete  example 
of  the  method  recommended.  Assume  that  the  figure  2 
representing  ''good"  expresses  the  condition  of  winter  rye 
on  marshlands  occupying  seven-tenths  of  the  whole  area 
of  the  crop  in  the  district ;  that  3  or  "medium"  is  the  condi- 
tion of  two-tenths  of  the  crop  on  clay,  and  that  5  or  "very 
poor"  is  applied  to  the  remaining  tenth  on  sandy  soil,  the 
average  condition  of  winter  rye  for  the  whole  district  will  be 
reckoned  as  follows : 

I.X2+ 2-X3-hi-x5  =  lM^±^  =  25  ^^  2.5. 
10  10  10  10  10 

The  yield  of  a  crop  in  a  district  of  unequal  conditions  is 
estimated  on  the  same  principle.  Thus,  assume  that  the 
oat  crop  of  a  district  is  divided  into  seven-tenths  on  marsh- 
lands and  three-tenths  on  sand,  that  the  former  proportion 
yields  at  the  rate  of  20  double  zentners,  and  the  latter  at  10 
double  zentners  per  hectare,  the  average  yield  of  the  oat  crop 
for  the  whole  district  will  be  computed  as  follows: 


AVERAGES  AS  TYPES  331 

7  vgn,    3      ^_140+30_170 

=  17  double  zentners  per  hectare. 

The  same  principle  of  computation  is  expected  to  be  applied 
by  the  correspondent  in  cases  where  the  crops  have  been 
partly  injured  by  drought,  wet,  frost,  hail,  storms,  cloud- 
bursts, flood,  animal  and  plant  pests,  etc.,  the  result  being 
in  all  cases  reported  to  the  Office  as  the  average  yield  for  the 
cultivated  area  in  the  district. 

Compensating  Errors  —  The  Logic  of  Large 
Numbers  in  Crop  Reporting  ^ 

Crop  reports  are  sometimes  called  guesses,  because  they 
are  based  upon  estimates  instead  of  actual  measurements. 
Of  course  such  estimates  are  not  haphazard  guesses ;  that  is, 
no  one  would  likely  estimate  the  yield  of  corn  at  100  bushels 
per  acre  when  it  is  actually  only  15  bushels,  nor  estimate  the 
yield  at  15  bushels  when  it  is  actually  100  bushels.  Neverthe- 
less, nearly  every  individual  estimate  has  an  element  of  error. 
Combination  of  individual  estimates  into  a  general  average 
tends  to  reduce  the  error  in  the  average.  The  manner  and 
extent  to  which  this  is  done  may  be  of  interest  to  the  many 
crop  reporters  (and  others)  who  frequently  feel  that  their 
individual  estimate  may  be  wide  of  the  truth,  but  who  may 
not  understand  fully  the  effect  of  combining  the  estimates 
of  many  individuals  and  thus  securing  an  accurate  average. 

For  the  purpose  of  analysis  or  study,  any  error  in  an  in- 
dividual estimate  may  be  considered  as  made  up  of  two  parts, 
namely  that  part  which  is  due  to  chance  and  that  part  which 
is  due  to  bias. 

'Adapted  with  permission  from  "Monthly  Crop  Reporter,"  United 
States  Department  of  Agriculture,  March,  1919,  p.  31. 


332  STATISTICAL   METHODS 

A  reporter  once  told  us  that  his  father  could  go  through 
an  orchard  and  estimate  its  production  more  closely  than 
any  other  person  in  his  section,  but  that  he  (the  reporter) 
could  make  a  better  estimate  than  his  father,  after  he  knew 
his  father's  estimate,  because  he  had  observed  that,  although 
his  father  made  a  close  estimate,  it  usually  fell  under  rather 
than  over  the  final  outcome ;  therefore,  by  making  allowance 
for  this  tendency,  he  could  use  his  father's  estimate  to  make 
a  still  closer  one. 

A  bias  in  an  estimate  is  that  part  of  the  error  that  tends  to 
make  it  lean  more  on  one  side  of  the  actual  truth  than  on 
the  other.  The  chance  error  is  that  part  that  is  equally  likely 
to  be  above  as  below  the  truth. 

The  chance  error  in  an  average  of  a  number  of  individual 
estimates  tends  to  decrease  as  the  number  of  estimates  in- 
cluded in  the  average  increases.  Suppose  any  one  man's 
estimate  is  taken ;  so  far  as  there  is  no  bias,  his  estimate  is 
just  as  likely  to  be  too  high  as  too  low  or  vice  versa.  Sup- 
pose we  get  an  estimate  from  two  men ;  both  may  be  too 
high,  or  both  may  be  too  low,  or  the  first  may  be  too  high 
and  the  second  too  low ;  or  the  first  may  be  too  low  and  the 
second  too  high.  Observe  that  there  are  four  possible  ar- 
rangements. There  is  one  chance  in  four  that  both  will 
be  too  high  and  one  chance  in  four  that  both  estimates  will 
be  too  low,  but  two  chances  in  four,  that  is,  an  even  chance, 
that  one  estimate  will  be  too  high  and  the  other  too  low, 
thus  offsetting  each  other.  If  estimates  from  four  men  are 
taken,  there  will  be  16  possible  arrangements,  and  there 
will  be  only  1  chance  in  16  that  all  will  be  too  low,  but  6 
chances  in  the  16  that  there  will  be  2  overestimates  offsetting 
2  underestimates.  And  thus,  as  the  number  of  estimates 
taken  increases,  the  chance  errors  tend  to  neutralize  or  offset 
each  other.     If  only  50  random  estimates  are  obtained  and 


AVERAGES  AS  TYPES  333 

averaged,  the  probability  that  all  the  chance  errors  will  be 
on  the  same  side  (that  is,  overestimates  or  underestimates) 
will  be  only  1  chance  out  of  562,949,953,421,312. 

If  the  probable  chance  error  of  an  individual's  estimate  is 
10  per  cent,  the  probable  error  of  the  average  of  25  reporters 
will  be  only  2  per  cent  and  the  probable  error  of  the  average  of 
50,000  reporters  will  be  less  than  one-twentieth  of  1  per  cent. 
An  individual  may  miss  the  mark  as  much  as  30  per  cent, 
but,  in  so  far  as  it  is  equally  likely  to  be  too  high  as  too  low, 
the  combination  of  2500  such  estimates  (the  usual  number 
of  returns  from  county  reporters)  would  give  an  average 
which,  by  the  law  of  averages,  would  likely  be  within  six- 
tenths  of  1  per  cent  of  accuracy. 

It  is  because  of  this  mathematical  law  of  averages  by  which 
large  numbers  of  chance  errors  in  combination  tend  to  offset 
each  other,  that  the  Bureau  of  Crop  Estimates,  at  small  cost 
as  compared  with  cost  of  an  actual  enumeration,  can  estimate 
so  closely  the  condition  and  production  of  crops. 

The  bias  factor  in  errors  of  estimates  is  more  complex  than 
the  chance  error  or  guess ;  it  is  not  eliminated  or  reduced  by 
increasing  the  number  of  reports ;  it  does,  however,  become 
more  and  more  nearly  constant ;  and  when  a  biased  estimate 
is  compared  with  a  similar  biased  estimate,  the  bias  is  neu- 
tralized and  thus  does  not  affect  the  result.  For  example, 
suppose  the  yield  per  acre  of  wheat  one  year  is  actually  10 
bushels,  and  the  reporter,  by  bias,  overestimates  the  crop 
10  per  cent ;  he  will  report  the  yield  1 1  bushels ;  suppose, 
again,  that  the  true  yield  next  year  is  20  bushels,  and  the 
reporter,  by  bias,  continues  to  overestimate  the  crop  10 
per  cent ;  he  will  report  the  yield  22  bushels.  It  will  be 
observed  that  the  reporter's  estimates  for  the  two  years,  11 
and  22,  show  the  true  change  ;  that  is,  a  doubling  of  the  yield, 
notwithstanding  that  both  estimates  were  erroneous  by  10 


334  STATISTICAL   METHODS 

per  cent.  Of  course  bias  is  not  the  same  all  the  time.  But 
the  combination  of  large  numbers  of  reports,  obtained  from 
practically  the  same  men  and  compiled  in  the  same  way 
from  month  to  month  and  year  to  year,  tends  to  stabilize 
the  results  and  make  them  truly  comparable,  if  not  absolutely 
correct. 

REVIEW 

1.  Compare  the  discussion  with  that  in  Chapter  II  on  Govern- 
ment Crop  Reports. 

2.  Restate  the  case,  as  developed  in  the  citation  immediately 
above,  for  the  use  of  the  normal  in  crop  reporting.  What  relation, 
if  any,  has  this  discussion  to  compensating  errors  as  here  developed  ? 

The  Calculation  of  the  Average  Tariff  Duty 

OR  Rate  ^ 

It  is  impossible  to  compare  directly,  in  any  broad  way, 
the  rates  of  duty  in  different  tariff  acts.  The  number  of 
items  is  large,  and  while  some  are  of  great  commercial  im- 
portance, others  are  of  little  importance.  The  most  prac- 
tical means  of  comparison  is  to  ascertain  the  value  of  imports 
for  all  articles  or  for  a  group  of  articles,  and  also  the  cor- 
responding amount  of  duty  collected,  and,  by  dividing  the 
amount  of  duty  by  the  value,  compute  the  average  ad 
valorem  rate  of  duty,  or,  as  it  is  often  called,  the  average  ad 
valorem  duty.  This  method  permits  ease  of  comparison,  but, 
like  all  averages,  has  serious  defects.  Aside  from  changes  in 
price  level,  the  volume  of  imports  affects  the  average  ad 
valorem  rate  just  as  much  as  the  rate  prescribed  in  the 
tariff.  If,  with  the  same  tariff  rates  in  force,  one  year  is 
marked  by  specially  large  imports  of  goods  dutiable  at  high 

•  Adapted  with  permission  from  "Foreign  Commerce  and  the  Tariff, 
1899-1915,"  1916.  Senate  Document  No.  366,  64th  Congress,  1st  Session, 
pp.  8-9,  13-16. 


AVERAGES  AS  TYPES  335 

tariff  rates  and  the  succeeding  year  by  specially  large  imports 
of  goods  subject  to  relatively  low  tariff  rates,  the  average 
ad  valorem  rate  of  duty  will  sharply  decline.  With  all  its 
limitations,  however,  the  average  ad  valorem  rate  of  duty 
remains  the  only  convenient  means  of  comparing  the  general 
level  of  duties  for  different  years. 

In  discussing  the  average  rate  of  duty  on  imports  at  differ- 
ent periods,  the  calculation  is  made  on  the  basis  both  of 
total  imports  for  consumption  and  of  dutiable  imports  for 
consumption.  For  some  purposes  it  is  desirable  to  show  the 
average  contribution  for  all  goods  entering  the  country,  and 
this  is  best  disclosed  by  dividing  the  amount  of  duty  collected 
by  the  total  imports.  For  other  purposes  it  is  better  to  show 
the  level  of  duties  on  articles  that  are  dutiable,  and  for  this 
comparison  the  duties  are  divided  by  the  amount  of  dutiable 
imports.  In  making  the  latter  computation  account  is 
taken  only  of  the  ordinary  duties,  while  the  so-called  addi- 
tional duties,  varying  in  amount  from  $1,198,621  in  the  fiscal 
year  1899  to  $191,769  in  the  fiscal  year  1915,  are  excluded. 
These  additional  duties  represent,  in  part,  the  penalty  im- 
posed on  articles  undervalued,  which  is  reported  only  in  the 
aggregate  and  not  in  respect  to  individual  articles,  and  in 
part  the  refund  of  drawback  and  the  duty,  equivalent  to 
internal-revenue  tax,  collected  on  articles,  grown,  produced, 
or  manufactured  in  the  United  States  when  reimported  after 
having  been  exported.  Since  these  articles  are  free  of  ordi- 
nary customs  duty,  they  must  be  excluded  from  considera- 
tion in  reckoning  the  average  ad  valorem  rate  on  dutiable 
goods.  The  additional  duties,  are,  however,  included  in  com- 
puting the  average  ad  valorem  rate  of  duty  on  total  imports. 


336  STATISTICAL   METHODS 

Average  Ad  Valorem  Duty  under  Recent  Tariffs 

The  average  rate  of  duty  on  imports  under  the  Under- 
wood-Simmons tariff  shows  a  less  marked  decrease  from  the 
previous  rates  than  is  frequently  inferred. 

The  fairest  comparison  is  undoubtedly  from  the  date  when 
the  law  became  effective  to  the  end  of  the  fiscal  year,  1914, 
just  one  month  before  the  outbreak  of  the  European  war. 
Since  the  provision  admitting  wool  free  of  duty  did  not  be- 
come effective  until  December  1,  1913,  and  the  new  rates  of 
duty  on  manufactures  of  wool  did  not  become  effective  until 
January  1,  1914,  imports  of  these  articles  are  included  only 
for  the  six  months,  January  1  to  June  30.  Similarly,  imports 
of  sugar  and  molasses  are  included  only  from  April  1  to  June 
30,  the  first  full  quarter  in  which  the  reduced  rates  (effective 
Mar.  1,  1914)  were  in  force.  Wool  and  manufactures  of 
wool  are  similarly  included  only  for  the  last  two  quarters, 
and  sugar  and  molasses  only  for  the  final  quarter  of  the  state- 
ment covering  the  nine  months  ending  June  30,  1913. 

By  this  means  returns  may  be  compared  with  no  overlap- 
ping of  tariffs.  Throughout  the  later  period  the  Underwood- 
Simmons  rates  were  in  force,  and  throughout  the  earlier 
period,  October,  1912,  to  June,  1913,  the  Payne-Aldrich  rates 
were  in  force. 

The  average  rate  of  duty  from  October  1-,  1912,  to  June  30, 
1913,  was  15.5  per  cent  ad  valorem,  calculated  on  total  im- 
ports, and  37.8  per  cent  ad  valorem,  calculated  exclusively 
on  dutiable  imports.  Similarly,  the  average  rate  of  duty 
from  October  4,  1913,  to  June  30,  1914,  was  12.3  per  cent  ad 
valorem,  calculated  on  total  imports,  and  34  per  cent  ad  va- 
lorem, calculated  on  dutiable  imports.  In  both  periods  wool 
and  manufactures  of  wool  are  included  only  for  the  last  six 
months  and  sugar  and  molasses  only  for  the  last  three  months. 


AVERAGES  AS  TYPES  337 

The  comparison  of  the  results  under  the  two  periods  shows 
that  for  approximately  nine  months  under  the  present  tariff 
the  average  rate  of  duty  was  3.2  per  cent  ad  valorem  less 
than  under  the  former  tariff,  when  the  average  is  calculated 
on  total  imports,  and  3.8  per  cent  ad  valorem  less  than  under 
the  former  tariff,  when  the  average  is  calculated  on  dutiable 
imports. 

An  increase  or  decrease  in  the  level  of  duties  may  be  com- 
pared on  two  bases  :  On  the  value  of  imports  or  on  the  former 
average  duty.  Suppose  that  all  articles  imported  were  sub- 
ject to  a  uniform  rate  of  duty  of  10  per  cent  ad  valorem  and 
a  new  law  was  passed  substituting  a  uniform  rate  of  duty  of 
8  per  cent  ad  valorem.  In  such  a  case  the  reduction  might 
be  described  either  as  2  per  cent  ad  valorem  (that  is,  2  per 
cent  of  the  value  of  the  imports)  or  20  per  cent  of  the  former 
duty  (2  divided  by  10). 

The  reduction  in  duty  under  the  present  tariff  being  from 
15.5  to  12.3  per  cent  ad  valorem  calculated  on  total  imports, 
means  that  for  the  same  amount  of  imports  the  customs  re- 
ceipts were  reduced  about  20  per  cent  of  the  former  duty 
(3.2  divided  by  15.5).  Similarly,  the  calculation  on  dutiable 
imports  alone  shows  a  reduction  from  37.8  to  34  per  cent  ad 
valorem.  These  latter  figures  indicate  that  the  reduction, 
considering  only  the  goods  that  remained  subject  to  duty, 
represented  approximately  10  per  cent  of  the  former  ad  va- 
lorem duty.  The  net  effects  of  the  law,  so  far  as  shown  by 
the  nine-month  returns,  are  therefore  a  considerable  increase 
in  the  free  list,  namely,  from  59.2  to  63.8  per  cent  of  the  total 
imports,  and  a  reduction  averaging  10  per  cent  in  the  rates 
of  duty  on  the  articles  that  remained  subject  to  duty. 

Unfortunately  the  average  rate  of  duty  can  not  be  shown 
for  the  six  groups  of  articles  classified  according  to  use  and 
degree  of  manufacture.     For  the  principal  items,  however, 


338  STATISTICAL   METHODS 

comprising  97  per  cent  of  the  total  imports,  a  separation  has 
been  made  between  manufactured  articles  and  unmanufac- 
tured articles.  The  average  rate  of  duty  on  articles  classified 
as  manufactured  was  23.8  per  cent  ad  valorem  in  the  nine 
months  ending  June  30,  1913,  and  19.8  per  cent  during  the 
nine  months  ending  June  30,  1914.  Calculated  on  the  basis 
of  dutiable  imports  only,  the  corresponding  percentages  are 
37.1  and  34.4  per  cent  ad  valorem.  In  the  case  of  unmanu- 
factured goods,  the  average  ad  valorem  rate  of  duty  on  total 
imports  was  6.2  per  cent  during  the  nine  months  ending 
June  30,  1913,  and  4.4  per  cent  during  the  nine  months  end- 
ing June  30,  1914,  The  corresponding  figures  based  on  duti- 
able imports  only  were,  respectively,  41.6  and  32.9  per  cent 
ad  valorem.  It  therefore  appears  that  the  decrease  in  the 
average  rate  of  duty  was  much  more  marked  in  the  case  of 
unmanufactured  than  manufactured  articles.  Taking  ac- 
count of  total  imports,  the  reduction  (4  per  cent  ad  valorem) 
in  the  rate  of  duty  on  manufactured  goods  was  16.8  per  cent 
of  the  former  rate  of  duty  on  manufactured  goods  and  the 
reduction  (1.8  per  cent  ad  valorem)  in  the  rate  of  duty  on 
unmanufactured  goods  was  29  per  cent  of  the  former  rate  of 
duty  on  unmanufactured  goods.  Taking  account  only  of 
dutiable  imports,  the  reduction  in  the  rate  of  duty  amounted 
to  7.1  per  cent  of  the  former  rates  in  the  case  of  manufactured 
goods  and  20.9  per  cent  of  the  former  rates  in  the  case  of  un- 
manufactured goods. 

It  is  of  interest  in  this  connection  to  compare  the  effect 
of  the  present  tariff  with  that  of  previous  tariffs.  The  in- 
fluence of  the  enactment  of  the  Payne-Aldrich  Tariff  Act 
is  seen  best  in  a  comparison  of  the  returns  for  the  fiscal  years 
1909  and  1911,  which  represent,  respectively,  the  last  full 
year  of  operation  of  the  Dingley  Tariff  Act  and  the- first  full 
year  of  operation  of  the  Payne-Aldrich  Act.     On  the  basis  of 


AVERAGES  AS   TYPES  339 

total  imports  the  average  rates  of  duty  were  23  per  cent  ad 
valorem  in  the  fiscal  year  1909,  and  20.3  per  cent  ad  valorem 
in  the  fiscal  year  1911,  the  2.7  per  cent  ad  valorem  decrease 
representing  a  reduction  of  11.7  per  cent  of  the  former  rates 
of  duty.  On  the  basis  of  dutiable  imports  only,  the  average 
rates  of  duty  were  43.1  per  cent  ad  valorem  in  1909,  and  41.2 
per  cent  ad  valorem  in  1911,  showing  a  reduction  of  nearly 
2  per  cent  ad  valorem,  or  about  4.5  per  cent  of  the  former 
rates. 

Naturally,  the  percentage  of  free  goods  is  larger  for  un- 
manufactured than  for  manufactured  goods,  under  both  the 
Payne-Aldrich  tariff  and  the  Underwood-Simmons  tariff. 
For  the  nine  months  ending  June  30,  1914,  under  the  Under- 
wood-Simmons Act,  the  average  ad  valorem  duty  on  dutiable 
imports  was  higher  for  manufactured  than  for  unmanu- 
factured goods,  while  for  the  nine  months  ending  June  30, 
1913,  under  the  Payne-Aldrich  tariff,  the  ad  valorem  duty 
on  unmanufactured  goods  was  slightly  higher  than  the  cor- 
responding rate  for  manufactured  goods. 

This  higher  average  rate  on  unmanufactured  goods  was 
due  to  the  fact  that  a  few  unmanufactured  articles,  of  which 
considerable  quantities  were  imported,  had  a  very  high  aver- 
age duty.  Thus,  during  the  nine  months'  period  ending 
June  30, 1913,  the  average  ad  valorem  rate  of  duty  on  tobacco, 
which  constituted  nearly  one-fourth  of  the  total  imports  of 
unmanufactured  articles,  was  83.77  per  cent ;  wool,  which 
constituted  over  one-eighth  of  such  imports,  had  an  average 
ad  valorem  rate  of  44.26  per  cent ;  lead  ore  had  an  ad  valorem 
rate  of  99.96  per  cent ;  zinc  ore,  45.04  per  cent ;  and  hay, 
55.06  per  cent.  These  articles,  except  wool,  which  is  now 
on  the  free  fist,  had  the  following  average  ad  valorem  rates 
of  duty  in  the  nine  months  ending  June  30,  1914 :  Tobacco, 
82.32;  lead  ore,  23.88;  zinc  ore,  10;  hay,  20.44. 


340  STATISTICAL  METHODS 

A  gradual  reduction  in  the  average  ad  valorem  rate  of 
duty  during  a  period  without  tariff  change  is  at  first  sight 
a  surprising  phenomenon.  From  the  fiscal  year  1899  to  the 
fiscal  year  1909,  the  period  during  which  the  Dingley  tariff 
was  in  force,  the  average  ad  valorem  rate  of  duty  on  total 
imports  decreased  from  29.5  to  23  per  cent,  and  the  average 
ad  valorem  rate  on  dutiable  imports  decreased  from  52.1  to 
43.2  per  cent.  There  was,  thus,  under  the  same  tariff,  a 
decrease  in  the  rate  of  duty  of  6.5  per  cent  ad  valorem,  based 
on  total  imports,  and  of  8.9  per  cent  ad  valorem,  based  on 
dutiable  imports,  or,  in  other  words,  an  average  reduction, 
respectively,  of  22  and  17.1  per  cent  of  the  rates  in  force  in 
1899.  From  1911  to  1913,  the  first  and  the  last  full  years 
during  which  the  Payne-Aldrich  tariff  was  in  effect,  there 
was  a  similar,  though  less  pronounced,  reduction  in  the  aver- 
age ad  valorem  rate  of  duty. 

This  tendency  is  due  to  the  gradual  rise  in  prices.  Specific 
rates  of  duty,  which  were  largely  used  in  both  the  Dingley 
and  Payne-Aldrich  tariff  acts,  remain  unchanged  as  prices 
rise  or  fall,  with  the  result  that  the  equivalent  ad  valorem 
rate  continually  falls  as  prices  rise.  The  effect  of  this  ten- 
dency is  obviously  to  exaggerate  the  apparent  reduction  in 
duty  when  duties  are  lowered,  and  to  minimize  the  apparent 
effect  when  they  are  raised. 

The  close  correspondence  between  the  estimated  receipts 
and  the  actual  receipts  under  the  Underwood-Simmons  tariff 
is  striking.  It  was  estimated  that  the  measure,  as  it  passed 
the  House  of  Representatives,  would  produce  during  its  first 
full  year  of  operation  $258,000,000 ;  as  it  passed  the  Senate, 
$248,000,000;  and  as  it  was  finally  enacted,  $249,000,000. 
Since  the  new  rates  on  sugar  and  molasses  became  effective 
March  1,  1914,  the  law  was  in  full  operation  only  five  months 
before  the  outbreak  of  the  European  war.     This  covered 


AVERAGES  AS  TYPES  341 

only  one  full  quarter,  that  extending  from  April  1  to  June  30, 
1914.  During  that  quarter  the  duties  amounted  to  $63,600, 
000,  and  at  this  rate  the  returns  for  a  full  year  would  have 
been  -1254,000,000.  The  receipts,  therefore,  exceeded  by 
some  $5,000,000  the  expected  proceeds. 

Owing  to  the  retention  of  the  old  duties  on  wool,  manu- 
factures of  wool,  sugar,  and  molasses,  it  is  difficult  to  make 
direct  comparison  for  the  two  quarters  ending,  respectively, 
December  31,  1913,  and  March  31,  1914.  The  excess  re- 
ceipts on  this  account  during  the  first  quarter  under  the  new 
tariff  may  be  estimated  at  $3,600,000  and  during  the  second 
quarter  at  $1,100,000.  Deducting  these  amounts  from  the 
actual  receipts  during  the  two  quarters,  the  revenue,  had 
the  act  of  1913  been  fully  in  force,  would  have  been  approxi- 
mately $61,700,000  during  the  quarter  ending  December  31, 
1913,  and  $65,300,000  during  the  quarter  ending  March  31, 
1914  —  in  the  one  case  just  about  $2,000,000  less  and  in  the 
other  case  just  about  $2,000,000  more  than  the  amount  of 
duty  during  the  quarter  ending  June  30,  1914,  when  the  new 
act  was  in  full  force. 

Averages  as  Measures  of  Street  Car 
Utilization.^ 

Utilization  of  Cars.  —  Degree  of  utilization  of  cars  is  a 
relative  conception  which  may  be  analyzed  into  several  dif- 
ferent relations,  with  corresponding  ratios.  A  certain  average 
number  of  separate  cars  is  used  by  a  given  company  more  or 
less  in  each  single  day  throughout  the  year.  The  number  of 
cars  in  the  possession  of  the  company  will  naturally  be  in 
excess  of  the  number  used  on  any  particular  day ;  or  if  rarely 

1  Adapted  ^vith  permission  from  Annual  Report  of  the  Public  Service  Com- 
mission of  the  First  District  of  the  State  of  New  York,  1912,  Vol.  II,  pp.  91-97. 


342  STATISTICAL   METHODS 

it  happens  that  every  car  both  can  be  and  is  put  out  at  some 
time  during  the  day,  the  average  number  for  the  365  days 
will  nevertheless  necessarily  be  considerably  smaller  than 
the  number  of  cars  in  the  possession  of  the  company,  pro- 
vided the  company  has  a  volume  of  business  worth  notice. 
The  ratio  of  the  number  of  cars  possessed  to  the  average 
number  used  is  thus  a  measure  of  the  reserve  supply  of 
cars  kept  to  provide  for  accidents,  repairs,  and  emergencies 
of  all  sorts.  But  this  ratio  is  subject  to  qualification  with 
reference  to  cars  designed  for  use  at  only  one  season  of  the 
year.  Open  cars  —  or  strictly  speaking,  open  car  bodies, 
since  the  same  trucks  and  motors  are  usually  put  under 
closed  car  bodies  in  winter  —  may  take  the  place  of  most, 
if  not  all,  the  closed  cars  during  several  summer  months, 
thus  giving  an  opportunity  for  thorough  overhauling  and 
making  it  possible  to  get  along  with  a  smaller  reserve 
in  winter.  If  the  peak  of  the  demand  upon  the  company 
comes  in  summer,  however,  the  open  cars  may  meet  that 
need  in  a  way  to  make  unnecessary  a  supply  of  closed  cars 
sufficient  to  meet  the  maximum  demand  of  the  year.  With 
this  qualification,  it  is  the  ratio  of  cars  usable  all  the  year, 
that  is,  closed  cars  and  convertible  cars,  to  the  average  num- 
ber used  that  gives  us  a  measure  of  the  necessary  reserve 
supply.  .  .  .  This  is  of  course  a  much  mixed  average  for 
all  sorts  of  transit  and  cars.  It  is  ob\'iously  much  affected 
in  the  case  of  two  items  by  the  emploj^ment  of  open  cars  to 
meet  a  summer  maximum  demand.  .  .  . 

Another  phase  of  the  utifization  of  cars  is  reflected  by  the 
car  miles  and  car  hours  operated  per  car  during  the  year. 
This  comparison  can  be  made  either  with  the  cars  in  the 
possession  of  the  company  or  with  the  average  number  used. 
...  It  appears  that,  in  terms  of  car  miles  per  average  num- 
ber of  cars  used,  the  rapid  transit  fines  in  general  make  a 


AVERAGES  AS   TYPES  343 

considerably  greater  average  nsv  of  their  cars  tlian  do  the 
surface  lines,  but  there  are  some  comi^anies  having  traffic  of 
an  interurban,  or  almost  interurban,  character  with  higher 
ratios  than  the  rapid  transit  lines.  The  ratio  of  the  Hudson 
&  Manhattan  is  not  to  be  accepted  at  its  face  value  owing  to 
the  unsatisfactory  way  in  which  this  company  determines 
the  "average"  number  of  cars  used.  In  terms  of  car  hours, 
on  the  other  hand,  the  degree  of  use  made  of  rapid  transit 
cars  is  rather  less  than  the  average  for  the  city  as  a  whole. 
The  outlying  surface  lines,  it  appears,  make  a  very  full  use 
of  such  rolling  stock  as  they  use  at  all,  that  is,  the  proportion 
of  cars  used  to  cars  owned  is  low,  but  that  of  car  miles  to 
cars  used  is  high.  All  these  ratios,  however,  are  somewhat 
subject  to  qualification,  owing  to  the  lack  of  sharp  definition 
of  the  average  number  of  cars  used  per  day  and  of  careful 
conformity  to  it  on  the  part  of  the  companies.   .  .   . 

Seat  mileage  operated  is  not,  or  should  not  be,  the  crude 
product  of  the  average  seating  capacity  times  the  number  of 
car  miles  operated,  and  therefore  the  dividend  obtained  by 
reversing  this  process,  that  is,  the  ratio  of  car  miles  operated 
to  seat  miles  operated,  should  not  l)e  expected  to  coincide 
with  the  average  seating  capacity  of  cars  owned.  The  con- 
tribution of  open  cars  to  car  mileage  will  obviously  not  be  in 
proportion  to  their  number,  since  they  are  operated  only 
during  the  summer  months,  while  it  will  be  considerably  more 
than  in  proportion  to  their  car  mileage,  OAving  to  their  large 
size,  in  terms  of  seating  capacity 

But  the  two  ratios  are  near  enough  together  to  indicate 
the  substantial  accuracy  of  the  seat-mile  return.  But  an 
incorrect  attribution  of  seating  capacity  to  cars  in  the  first  in- 
stance would  affect  both  similarly.  The  fact  that  the  aver- 
age seating  capacity  of  cars  in  the  possession  of  the  companies 
is  a  trifle  smaller  than  of  cars  operated  may  be  exiilaincd 


344  STATISTICAL   METHODS 

by  a  preferential  use  of  the  newer  and  on  the  average  larger 
cars.  On  the  other  hand,  the  open  cars,  with  their  larger 
seating  capacity,  should  have  more  influence  upon  the  aver- 
age for  cars  owned  or  leased,  owing  to  their  use  in  summer 
only.  This  factor  seems  to  have  been  counterbalanced  by 
the  one  just  referred  to.  .  .  . 

The  greater  the  seating  capacity  the  fuller  is  the  utiliza- 
tion by  a  company  of  its  individual  cars.  An  appreciation 
of  this  fact  accounts  for  the  tendency  of  street  railways  to 
use  larger  cars.  Traffic  conditions,  however,  limit  the  pos- 
sibility of  taking  advantage  of  this  econom3^  For  some  types 
of  service,  moreover,  facility  in  loading  and  unloading  is  more 
important  than  additional  seats. 

REVIEW 

1.  With  what  types  of  units  does  this  article  deal  ?  Consult  the 
Text,  Chapter  III,  and  Professor  Bowley's  notion  of  "relativity"  in 
The  Nature  and  Condition  of  Statistical  Measurement,  in  Chapter  III, 
supra. 

2.  What  is  the  "measure  of  the  necessary  reserve  supply"  of 
cars?     Why  is  this  a  "much  mixed  average"? 

3.  What  conditions  would  affect  a  comparison  of  car  hours,  and 
car  miles  on  a  given  line,  and  on  different  lines? 

4.  Define  the  unit  seat  mile.  How  may  the  number  of  seat 
miles  be  calculated  for  a  given  line?  What  effect  has  the  greater 
use  of  new  cars,  and  of  the  larger  cars  on  this  average? 

Car-Seat  Mile  Averages  and  Ratios  ^ 

Car-seat  Mile  Ratios.  —  Ratios  of  seat  miles  to  passengers 
are  better  comparable  as  between  companies  and  between 
different  years  than  car-mile  ratios,  since  allowance  is  made 
for  difference  in  the  seating  capacity  of  cars.     The  table  be- 

'  Adapted  with  permission  from  Annual  Report  of  the  Public  Service  Com- 
mission of  the  First  District  of  the  State  of  New  York,  1913,  Vol.  II,  pp.  76-78. 


AVERAGES  AS  TYPES 


345 


Seat  Miles  in  Relation  to  Passengers  and  to  Car  Miles, 

1912  AND  1913 


Roads 


Hudson  &  Manhattan     . 
Interborough      .     .     .     . 

Rapid  Transit  subway- 
Manhattan  elevated 
Brookyn  Rapid  Transit 

Elevated  division    . 
Surface  division 
Bridge  Locals    .     .     . 


Brooklyn  bridge 
Williamsburg  bridge 
Queensboro  bridge 
Manhattan  bridge  . 
Manhattan  surface 

Electric  contact 

Storage  battery 

Horse    .     . 
Bronx  Surface    . 

Trolley .     .     . 

Monorail  electric 

Horse    . 
Brooklyn,  Excl.  B 
Queens,  Excl.  B.R. 
Richmond 
Underground 
Elevated  . 

Total  rapid  transit 

Conduit  electric 

Trolley 

Storage  battery 
Monorail  electric    . 
Total  electric  surface 

Horse 

Grand  total    .     .     . 


R.T, 

r. 


Seat  Miles  to  Passengers 


Ratios 


1912 


5.65 
10.66 

10.85 

10.47 

8.53 

9.98 
7.63 
2.41 


2.36 
2.30 
2.91 

5.75 

5.79 

5.25 

4.90 

9.74 

9.80 

3.12 

1.61 

9.84 

9.66 

8.94 

10.01 

10.29 

10.17 


5.79 
8.12 
5.25 
3.12 
7.05 


4.86 
8.58 


1913 


5.79 
10.28 

10.21 

10.35 

8.29 

9.79 
7.39 
2.34 

2.24 
2.01 
3.05 
3.47 
5..39 

5.43 
4.98 
4.42 
9.69 
9.74 
3.27 
2.49 
9.25 
8.92 
9.03 
9.53 
10.15 
9.87 

5.43 
7.84 
4.98 
2.90 
6.74 

4.39 

8.26 


Points 
differ- 
ence 
between 
ratios 


+  0.14 
-0.38 

-0.64 
-0.12 
-0.24 

-0.19 
-0.24 
-0.07 

-0.12 
-0.29 
+0.14 

-0.36 

-0.36 
-0.27 
-0.48 
-0.05 
-0.06 
+0.15 
+0.88 
-0.59 
-0.74 
+0.09 
-0.48 
-0.14 
-0.30 

-0.36 
-0.28 
-0.27 
-0.22 
-0.31 

-0.47 

-0.32 


Seat  Miles  to  Car  Miles 


Ratios 


1912        1913 


44.00 
49.95 

52.00 
48.00 
47.23 

52.12 
43.89 
34.99 

35.95 
37.63 
28.00 

40.11 

41.55 
21.87 
23.22 
45.24 
45.29 
43.96 
17.18 
48.60 
43.52 
37.79 
51.16 
49.37 
50.11 

41.57 
44.05 
21.87 
43.96 
42.94 

23.18 

40.G6 


44.00 
49.97 

52.00 
48.00 
47.38 

52.11 
44.19 
33.67 

36.00 
37.63 
28.00 
28.00 
40.98 

42.54 
22.86 
22.60 
45.87 
45.91 
43.84 
21.52 
47.56 
43.42 
39.25 
51.14 
49.37 
50.11 

42.55 
44.31 

22.86 
38.82 
43.46 

22.59 

46.91 


Points 
differ- 
ence 
between 
ratios 


+  0.02 


+0.15 

-0.01 
+0.30 
-1.32 

+0.05 


+0.87 

+0.99 
+0.99 
-0.62 
+0.63 
+0.62 
-0.12 
+4.34 
-1.04 
-0.10 
+  1.46 
-0.02 


+0.98 
+0.26 
+0.99 
-5.14 
+0.52 

-0.59 

+0.25 


346 


STATISTICAL  METHODS 

Ratios  for  grand  totals  of  prior  years 


Earlier 
year 

Later 

year 

Differ- 
ence 

Earlier 
year 

Later 
year 

Differ- 
ence 

1911  and  1912   

1910  and  1911 

8.60 

8.47 

8.58 
8.60 

-0.02 
+0.13 

46.60 
46.28 

46.66 
46.60 

+0.06 
+0.32 

low  gives  such  ratios,  as  well  as  ratios  of  seat  miles  to  car 
miles,  for  the  main  groups  of  companies. 

The  most  striking  feature  of  the  table  is  the  decrease  in 
the  ratio  of  seat  miles  to  passengers  which  took  place  in  1913 
in  the  case  of  nearly  every  group  shown  in  the  table.  This 
is  due  of  course  to  the  smaller  increase  in  accommodations 
than  in  passengers,  which  has  already  been  noted,  and  is  of 
advantage  to  the  companies  and  likely  to  be  to  the  disadvan- 
tage of  the  traveling  public.  The  Queens  roads  profited 
most  in  this  respect,  and  the  Interborough  subway  was  next. 
In  1912  the  latter  gave  the  greatest  service  in  exchange  for 
a  nickel  as  measured  by  seat  miles,  but  in  1913  the  amount 
of  such  service  was  surpassed  by  that  offered  by  the  Manhat- 
tan elevated.  To  one  learning  the  fact  for  the  first  time,  it 
appears  surprising  that  the  most  congested  of  all  lines,  the 
Manhattan  and  Brooklyn  elevated  and  the  Interborough 
subway,  should  give  the  greatest  number  of  seat  miles  per 
passenger.  The  high  ratios  are  of  course  due  to  the  un- 
usually long  average  ride  taken  by  passengers  and  to  the 
immense  number  of  empty  seats  during  the  last  mile  or  two 
of  the  trip  to  the  outskirts  of  the  thickly  settled  portion  of 
the  city.  The  small  ratios  of  the  bridge  locals,  the  monorail 
and  the  Bronx  horse  cars,  are  of  course  due  to  the  shortness 
of  the  route.  .  .  . 

The  ratios  of  seat  miles  to  car  miles  are  equivalent  to  the 
average  seating  capacities  of  cars  actually  in  use,  as  distin- 
guished from  the  average  seating  capacity  of  all  cars  owned  or 


AVERAGES  AS  TYPES  347 

leased.  .  .  .  For  several  years  the  average  capacity  has 
continuously  been  increasing  for  the  city  as  a  whole,  clue 
both  to  the  increasing  proportion  of  rapid-transit  traffic, 
which  employs  cars  of  comparatively  large  seating  capacity, 
and  to  the  installation  of  new  and  larger  cars  on  the  surface 
lines.  The  most  marked  increases  in  seating  capacity  since 
1910  —  the  first  year  for  which  figures  for  seat  miles  are 
available  —  are  shown  for  the  Manhattan  and  Richmond 
surface  roads,  6.6  per  cent,  which  is  almost  equaled  by  the 
increase  for  Bronx  surface  roads,  6.5  per  cent.  The  average 
capacity  has  not  changefl  at  all  during  the  3-year  period 
for  the  Hudson  &  Manhattan,  Interborough  subway,  Man- 
hattan elevated,  and  QueeUvsboro  bridge  locals.  It  has  slightly 
decreased  for  the  Brooklyn  Rapid  Transit  elevated  and  sur- 
face, the  other  Brooklyn  roads,  the  Queens  roads,  and  the 
Williamsburg  and  Brooklyn  bridge  locals.  In  the  case  of 
the  Brooklyn  and  Queens  surface  roads,  the  decrease  is  prob- 
ably due  to  a  decrease  in  the  proportionate  use  of  open  cars, 
which  have  a  considerably  larger  seating  capacity  than  closed 
cars  of  the  same  size.  For  the  Brooklyn  elevated  lines  on 
elevated  structures,  the  average  seating  capacity  slightly 
increased,  the  decrease  for  the  elevated  division  as  a  whole 
being  due  to  the  change  on  the  South  Brooklyn  and  Sea  Beach 
lines  on  the  surface  over  which  "elevated"  trains  run. 

REVIEW 

1.  Put  into  the  form  of  a  general  statement  the  conditions  that 
should  be  observed  in  comparing  the  seat  miles  of  two  different 
lines. 

2.  How  is  this  discussion  related  to  the  contention  of  the  Text 
that  "like  can  be  compared  only  with  like"? 

3.  What  conditions  have  operated  to  change  the  ratios  of  seat 
miles  to  car  miles? 


348 


STATISTICAL   METHODS 


REVIEW   PROBLEMS 
Averages 
1.   Using  the  data  in  the  table  below 

(1)  Compute  the  arithmetic  average  expenditure  for  breakfast, 
dinner,  and  supper. 

(2)  Compute  the  median  expenditure  —  to  the  nearest  group 
and  also  to  the  nearest  cent  —  for  breakfast,  dinner,  and  supper. 

Table    Showing   the   Expenditures    for   Food    by   Men   and 

Women  and  by  Meals 


•Meals  and  Purchasers  ob 

Food 

Ex- 

pendi- 

ture 

Total 

Breakfast 

Dinner 

Supper 

Groups 

(cents) 

Total 

Men 

2897 

Women 

Total 

Men 

Women 

Total 

Men 

1391 

2 

Women 

Total 

Men 

1147 

4 

Women 

Total 

6843 

3946 

836 

359 

477 

3233 

1842 

2774 

1627 

3  to  7 

15 

7 

8 

5 

1 

4 

6 

4 

4 

8  to  12 

188 

64 

124 

84 

25 

59 

57 

12 

45 

47 

27 

20 

13  to  17 

516 

150 

366 

252 

91 

161 

183 

39 

144 

81 

20 

61 

18  to  22 

763 

230 

533 

220 

87 

133 

356 

98 

258 

187 

45 

142 

23  to  27 

982 

343 

639 

134 

65 

69 

552 

186 

366 

296 

92 

204 

28  to  32 

849 

345 

504 

70 

42 

28 

497 

211 

286 

282 

92 

190 

33  to  37 

672 

315 

357 

34 

25 

9 

350 

179 

171 

288 

111 

177 

38  to  42 

758 

334 

424 

19 

11 

8 

336 

174 

162 

403 

149 

254 

43  to  47 

702 

351 

351 

14 

12 

2 

297 

164 

133 

391 

175 

216 

48  to  52 

563 

307 

256 

2 

— 

2 

224 

120 

104 

337 

187 

150 

53  to  57 

407 

223 

184 

1 

— 

1 

159 

89 

70 

247 

134 

113 

58  to  62 

179 

106 

73 

— 

— 

— 

77 

47 

30 

102 

59 

43 

63  to  67 

100 

49 

51 

— 

— 

— 

60 

29 

31 

40 

20 

20 

68  to  72 

75 

37 

38 

— 

— 

— 

38 

23 

15 

37 

14 

23 

73  to  77 

36 

16 

20 

— 

— . 

— 

18 

7 

11 

18 

9 

9 

78  to  82 

17 

11 

6 

— 

— 

— 

9 

5 

4 

8 

6 

2 

83  and 

over 

21 

9 

12 

1 

— 

1 

14 

6 

8 

6 

3 

3 

(3)  Locate  the  modal  expenditure  —  to  the  nearest  group  and 
also  to  the  nearest  cent  —  for  breakfast,  dinner,  and  supper. 

2.    Compare  the  averages  (arithmetic  means)  medians,  and  modes. 


AVERAGES  AS  TYPES  349 

(1)  Arrange  these  in  the  form  of  a  table.  Give  the  same  a  proper 
title  and  express  comparatively  in  the  table  the  relations  which  they 
bear  to  each  other. 

(2)  How  nearly  is  the  contention  in  the  Text  realized,  that  these 
averages  for  series,  not  too  asymmetrical,  stand  in  a  definite  re- 
lationship ? 

(3)  How  differently,  if  at  all,  would  you  interpret  these  averages 
if  the  series  were  continuous  rather  than  discrete? 

3.  By  the  use  of  the  averages  computed  in  this  problem,  verify 
the  properties  of  averages  as  described  on  pp.  279-289  of  the  Text. 
How  satisfactory  would  it  be  solely  to  speak  of  these  expenditures 
in  terms  of  averages? 

4.  Using  the  data  above,  but  reduced  to  percentages, 

(1)  Compare,  by  using  simple  percentages,  frequency  graphs 
drawn  on  a  single  figure  and  to  a  common  scale,  the  expenditures 
for  breakfast,  dinner,  and  supper. 

(2)  Locate  the  mode  graphically  and  compare  your  figure  with 
that  determined  arithmetically  in  Problem  1  —  (3). 

(3)  Indicate  on  the  graphs  the  positions  of  the  medians  and 
arithmetic  means,  as  determined  in  Problem  1  —  (3).  What  order 
do  they  have?  Are  they  equally  distant  apart?  Absolutely? 
Test  by  reference  to  Problem  2.  Express  the  relations  graphically 
by  the  use  of  bar  diagrams. 

5.  From  your  answers  to  Problems  1-4,  and  from  such  other 
computations  as  seem  to  you  to  be  necessary,  answer  the  following 
questions : 

(1)  Are  the  men  more  or  less  consistent  than  the  women  in  their 
expenditure  for  different  meals? 

(2)  In  their  expenditure  for  all  meals? 

(3)  How  do  you  measure  consistency? 

(4)  Do  your  graphs  in  Problem  4  help  you  to  answer  these  ques- 
tions ?     In  what  way,  if  at  all  ? 

6.  By  applying  an  entirely  different  set  of  weights  from  those 
used  by  the  Bureau  of  Labor,  see  p.  176,  calculate,  for  the  same  acci- 
dents, both  a  severity  rate  and  a  frequency  rate.  What  effect 
seems  to  be  assignable  to  the  weights?  Do  you  agree  with  the 
generalization  that  '  'the  character  of  the  weighting  scale  used  be- 
comes comparatively  unimportant"?  Does  your  one  illustration 
serve  as  an  adequate  basis  for  giving  an  answer  ?     Try  other  weights. 


CHAPTER  VIII 

PRINCIPLES  OF  INDEX  NUMBER  MAKING  AND  USING 

Method  of  Computing  Index  Numbers  —  Bureau 
OF  Crop  Estimates  ^ 

The  trend  of  prices  to  farmers  for  important  crops  is 
indicated  in  the  following  figm-es ;  the  base,  100,  is  the 
average  price  December  1  in  the  43  years  1866-1908,  of 
wheat,  corn,  oats,  barley,  rye,  buckwheat,  potatoes,  hay, 
flax,  and  cotton : 


1919 

1918 

1917 

1916 

1915 

1914 

1913 

1912 

1911 

1910 

Jan.    1 

272.4 

264.1 

183.6 

129.0 

126.7 

132.5 

110.9 

133.9 

118.6 

134.1 

Feb.    1 

259.9 

271.6 

195.6 

139.9 

140.5 

132.1 

112.6 

140.2 

119.8 

138.5 

Mar.  1 

257.1 

288.8 

206.5 

138.6 

144.0 

133.8 

113.3 

144.7 

117.9 

139.9 

Apr.    1 

271.2 

288.6 

225.2 

140.2 

144.5 

134.2 

113.6 

153.4 

118.0 

138.8 

May  1 

293.7 

281.8 

280.6 

143.3i  150.0 

135.9 

116.2 

166.3 

122.2 

133.5 

June  1 

307.2 

271.9 

291.3 

145.8 

147.3 

138.8 

121.2 

168.3 

127.7 

133.5 

July    1 

310.2 

272.9 

289.9 

144.8 

139.1 

137.7 

122.9 

160.1 

136.3 

133.1 

Aug.  1 

— 

280.6 

307.8 

147.7 

138.9 

137.6 

125.4 

148.0 

148.2 

137.1 

Sept.  1 

— 

293.3 

279.6 

161.5 

132.5 

141.3 

136.3 

137.6 

141.6 

137.0 

Oct.    1 

289.3 

277.0 

163.6 

128.2 

136.4 

139.1 

128.6 

138.0 

129.8 

Nov.  1 

— 

266.5 

261.3 

178.8 

124.4 

127.4 

133.9 

118.3 

135.6 

122.2 

Dec.    1 

265.5 

252.3 

187.9 

120.4 

122.8 

132.7 

110.3 

133.1 

118.4 

The  index  numbers  of  prices  as  published  by  the  Bureau 
of  Crop  Estimates  of  the  United  States  Department  of 
Agriculture,  are  the  result  of  — 

1  Taken  with  permission  from  "Monthly  Crop  Report,"  United  States 
Department  of  Agriculture,  July,  1919,  and  August,  1918,  pp.,  respectively, 
67  and  96. 

350 


INDEX  NUMBF/R   MAKING  AND   USING 


351 


(A)  A  comparison  of  the  current  price  of  each  of  10  crops 
with  its  average  December  1  price  for  the  43-year  period 
186G-1908,  and 

(B)  A  combination  into  one  figure  of  the  10  index  num- 
bers thus  obtained  for  the  10  crops,  by  weighting  them 
with  figures  approximately  proportionate  to  the  importance 
of  the  several  crops  in  the  aggregate  value  of  the  10  crops 
for  a  series  of  years. 

These  processes  may  be  shown  as  follow^s : 


Wheat  . 
Corn  .  . 
Oats  .  . 
Barley 
Rye  ■  .  . 
Buckwheat 
Potatoes  . 
Hay  .  . 
Flax  .  . 
Cotton 


(1) 

43-Year 

Average 

Dec.  1 

PRICE 


$0.8450 
.4148 
.3274 
.5747 
.6271 
.0228 
.5276 
9.3820 
1.0000 
.10(X) 


(2) 

Current 

Price  (Apr.  1, 

1917) 


$1,800 

1.134 

.615 

1.023 

1.356 

1.283 

2.347 

13.050 

2.661 

.180 


(3) 

Index  Number 
(Column  2 
Divided  by 

Column  1  and 

Multiplied 

BY  100)' 


213.0 
273.4 
187.8 
178.0 
216.2 
206.0 
444.8 
139.1 
266.1 
180.0 


The  following  tabulations  show  the  different  steps  in  ob- 
taining the  final  index  number,  in  their  logical  order.  In 
office  work,  however,  a  much  simpler  process  is  used,  as 
the  result  of  uniting  into  a  combination  weight,  or  con- 
stant (the  same  for  all  months),  the  various  known  factors, 
leaving  only  the  single  unknown  factor   (cui-rcnt  i)rice)  to 


1  That  is,  per  cent  that  current  price  is  of  43-year  average  Dec.  1  price. 


352 


STATISTICAL   METHODS 


B 


Wheat 

Corn 

Oats 

Barley 

Rye 

Buckwheat  .... 

Potatoes  

Hay 

Flax 

Cotton 

10  crops  combined 


Index 
Number 


213.0 
273.4 
187.8 
178.0 
216.2 
206.0 
444.8 
139.1 
266.1 
180.0 


2225.2 


Weight 

(Approximate 

Proportion 

OP  Aggregate 

Value  of 
10  Crops  re- 
presented BY 
EACH  Crop)' 


176 

325 

93 

28 

6 

3 

55 

172 

7 

135 


1,000 


Extension 


37,488.0 

88,855.0 

17,465.4 

4,984.0 

1,297.2 

618.0 

24,464.0 

23,925.2 

1,862.7 

24.300.0 


225,259.5 


be  applied  when  determined  at  the  time  of  the  report. 
This  method  of  simplification  by  factoring  may  be  shown 
as   follows : 

Representing  current  price  by  P  and  the  crops  by  small 
initial  letters  and  analyzing  operations  called  for  in  tabu- 
lations above,  we  have  — 


176x213.0  =  176  X 


325  X  273.4  =  325  X 


PiD        176 


.8450     .8450 
Pc        325 


XPw  =  20%xPw 


,4148     .4148 


93X187.8=  93  X 


Po 


93 


3274     .3274 


xPc  =78S  xPc 


XPo  =284  xPo,  etc. 


*  Obtained  by  multiplyint;  1909  production  for  each  crop  by  43-year 
average  price  and  di\'iding  the  resultant  product  by  the  aggregate  of  such 
values  (regarded  as  the  base,  or  1000)  for  the  10  crops. 

^  Extension  divided  by  weight. 


INDEX  NUMBER  MAKING  AND  USING 


353 


Constants  having  been  thus  obtained  once  for  all,  the 
operation  to  be  performed  at  the  time  of  the  report  is  thereby 
condensed  into  the  simple  operation  of  multipljdng  current 
prices  of  the  individual  crops  by  their  respective  constants 
and  pointing  off  the  sum  of  the  extensions. 

The  sum  of  the  extensions  is  practically  the  same  as  the 
sum  of  the  extensions  in  tabulation  B  —  they  are  identical 
if  both  operations  are  carried  out  to  the  same  degree,  no 
additional  factors  having  been  included  in  tabulation  C ; 
therefore  the  index  number  is,  as  in  tabulation  B,  the  sum 
of  the  extensions  divided  by  1000,  or  225.2. 

For  April  1  the  results  were  as  follows  : 


Wheat      .... 

Corn 

Oats 

Barley      .... 
Rye  .... 

Buckwheat   .     .     . 
Potatoes  .... 

Hay 

Flax 

Cotton     .... 

10  crops  combined 


Combination 

Weight,  or 

Constant 


208 

783 

284 

49 

10 

5 

104 

18.3 

7 

1,350 


Price 

April  1, 

1917 


Cents 
180.0 
113.4 
61.5 
102.3 
135.6 
128.3 
234.7 

1,305.0 

266.1 

18.0 


Extension 


37,440.0 

88,792.2 

17,466.0 

5,012.7 

1,356.0 

641.5 

24,408.8 

23,881.5 

1,862.7 

24,.300.0 


225,160.4 


The  ten  crops  considered  in  the  index  number  comprise 
nearly  90  per  cent  of  the  area  in  all  field  crops,  the  average 
value  per  acre  of  which  closely  approximates  the  value 
per  acre  of  the  aggregate  of  all  crops.     Therefore,  the  index 

'  Index  number,  225.2, 
2a 


354  STATISTICAL   METHODS 

numbers  based  upon  these  crops  may  be  regarded  as  practi- 
cally the  same  as  if  all  the  minor  crops  were  included.  The 
December  1  price  for  43  years,  1866-1908,  was  used  be- 
cause it  was  the  longest  period  of  prices  available  when  the 
index  numbers  began,  in  1908. 

The  Why  and  How  of  Stock  Index  Numbers  ^ 

In  recent  years  index  numbers  of  stock  prices  have 
gained  general  acceptance:  they  are  regularly  "carried" 
by  the  financial  press;  they  are  watched  by  bankers,  in- 
vestors, and  speculators ;  they  are  put  before  railway  com- 
missions and  courts  as  evidence ;  they  are  used  in  many 
ways  by  publicists  and  economists.  This  acceptance, 
however,  is  not  the  result  of  critical  approval.  Perhaps 
the  good  repute  which  index  numbers  of  commodity  prices 
at  wholesale  have  fairly  won  after  long  discussions  has  dis- 
posed most  "consumers  of  statistics"  to  trust  index  num- 
bers as  such.  But  apart  from  any  special  justification, 
there  certainly  prevails  an  amiable  willingness  to  take  upon 
faith  plausible  figures  that  fill  a  pressing  want.  And  the 
stock  index  numbers  have  been  published  in  the  form  that 
makes  new  figures  most  alluring  —  the  paucity  of  explana- 
tions and  warnings  has  encouraged  readers  to  use  or  mis- 
use the  results  without  undergoing  the  mental  toil  of  criti- 
cism or  the  moral  strain  of  doubt.  As  for  the  cautious 
minority,  thej'  have  been  foiled  by  this  same  simpHcity 
of  presentations  ;  they  have  been  given  few  materials  where- 
with to  determine  the  representative  value  of  the  original 
quotations,  to  judge  the  appropriateness  of,  the  methods 
used,  or  to  compare  the  results  of  rival  series.  .  .  . 

1  Adapted  with  permission  from  Mitchell,  Wesley  C,  "A  Critique  of 
Index  Numbers  of  the  Prices  of  Stocks,"  in  Journal  of  Political  Economy, 
July,  1916,  Vol.  XXIV,  pp.  625-631. 


INDEX  NUMBER  MAKING  AND  USING  355 

The  Fundamental  -  Difference  between  Stock  and  Com- 
modity Index  Numbers.  —  In  several  respects  stock  prices 
are  more  satisfactory  data  for  statistical  analysis  than  com- 
modity prices.  Stock  dealings  are  more  highly  central- 
ized and  more  thoroughly  organized  than  dealings  in  most 
commodities.  The  prices  are  reported  with  unexcelled  full- 
ness and  accuracy.  While  the  number  of  stocks  for  which 
frequent  and  regular  quotations  can  be  collected  for  con- 
siderable periods  is  less  than  the  corresponding  number 
of  commodities  at  wholesale,  it  doubtless  forms  a  larger 
proportion  of  the  whole  list  dealt  in.  Once  more,  stocks 
are  quoted  in  terms  of  a  nominally  uniform  unit  —  the  share 
with  a  par  value  of  $100,  or  some  multiple  that  can  readily 
be  changed  into  the  standard  unit.  Hence  the  actual 
prices  can  be  compared,  summed,  and  averaged  with  a 
facility  lacking  when  one  handles  commodity  prices.  Con- 
cerning the  authenticity  and  the  representative  character 
of  stock  quotations,  in  short,  there  are  fewer  doubts  than 
haunt  the  mind  of  the  field-worker  in  a  commodity-price 
investigation. 

It  is  when  one  begins  to  interpret  these  quotations  that 
doubts  become  grave.  First  there  is  the  familiar  ques- 
tion :  What  does  the  share  \vith  a  par  value  of  SlOO  really 
mean?  Second,  there  is  the  assurance  that  whatever  that 
unit  in  one  corporation  means  this  year,  it  will  probably 
mean  something  different  next  year.  Commodities  are 
tangible  substances,  measured  by  physical  units,  and  in 
making  index  numbers  one  rejects  articles  that  are  not  sub- 
stantially uniform  in  quality  over  long  periods.  Business 
enterprises,  on  the  contrary,  are  essentially  variable  en- 
tities, and  shares  in  them  are  subject  to  changes  that  af- 
fect the  enterprises,  and  to  other  changes  as  well.  The 
Pennsylvania  Railroad,  for  example,  is  a  remarkably  stable 


356  STATISTICAL   METHODS 

corporation ;  yet  its  physical  property,  its  security  hold- 
ings, its  leases,  its  indebtedness,  its  earnings  and  expenses, 
its  financial  affiliations,  its  relations  to  regulating  com- 
missions, and  a  hundred  other  matters  that  affect  the  market 
value  of  its  shares,  all  vary  constantly  or  intermittently. 
To  cite  only  the  one  crude  gauge  :  the  Pennsylvania  system 
counted  about  7600  miles  of  railway  in  1890  and  about 
11,800  miles  in  1915.  And  in  this  changing  property  a  share 
of  common  stock  in  1890  represented  ownership  of  one  part 
in  2,451,354,  whereas  in  1915  one  share  represented  owner- 
ship of  one  part  in  9,985,314.  Stocks,  then,  are  variable 
fractions  of  variable  wholes,  and  their  prices  fluctuate  in- 
cessantly because  of  changes  in  the  thing  quoted,  as  well 
as  for  other  reasons. 

From  such  facts  it  is  sometimes  inferred  that  index 
numbers  of  stock  prices  have  no  valid  use  except  for  short- 
period  comparisons ;  or  that  an  index  number  covering  dec- 
ades is  no  better  when  it  excludes  than  when  it  admits 
numerous  substitutions  of  one  stock  for  another.  In  any 
case,  the  argument  runs,  comparisons  of  stock  prices  in 
years  far  apart  are  comparisons  of  dissimilar  goods ;  they 
are  like  comparisons  of  the  prices  of  potatoes  and  silk  in 
1890  with  the  prices  of  pig  iron  and  tea  in  1915. 

Such  conclusions,  however,  are  rash.  The  fact  that 
stocks  change  as  commodities  do  not,  proves  merely  that 
stock  index  numbers  must  not  be  interpreted  as  meaning 
precisely  what  commodity  indexes  mean.  It  does  not 
prove  that  stock  indexes  are  meaningless,  or  that  alterations 
in  the  list  of  securities  included  in  them  are  unobjectionable. 
Business  enterprises,  indeed,  are  more  like  men  than  they 
are  like  commodities.  Commodities  are  produced  and 
consumed ;  then  produced  afresh  in  the  old  forms.  Busi- 
ness enterprises  have  a  continuous  life ;   they  undergo  great 


INDEX  NUMBER  MAKING  AND  USING         357 

changes  of  expansion,  contraction,  even  reorganization, 
without  losing  their  identity.  And  this  continuity  of  busi- 
ness enterprises  and  of  shares  in  them  is  a  fact  of  great 
practical  importance.  The  many  individuals  and  corpora- 
tions that  hold  stock  in  the  same  business  enterprises  for 
years  at  a  time  are  deeply  concerned  with  long-period 
changes  in  the  prices  of  their  securities.  The  like  holds 
true  with  reference  to  the  "investing  public"  as  a  unit, 
and  to  its  security  holdings  as  an  aggregate.  Even  the 
wider  public  in  its  efforts  to  regulate  corporation  charges 
and  corporation  finance  through  governmental  commissions 
needs  to  know  the  course  followed  by  security  prices  in  par- 
ticular and  in  general.  The  fluctuations  of  New  Haven  stock 
between  the  early  nineties  and  the  present  are  not  rated  a 
matter  of  indifference ;  neither  are  the  very  different  fluctua- 
tions of  Pennsylvania  stock,  nor  the  still  different  fluctu- 
ations of  Lackawanna.  Nor  is  it  unimportant  to  find  out 
which  type  of  fluctuations  has  been  characteristic  of  American 
stocks  at  large. 

Stock  indexes,  then,  differ  from  commodity  index  num- 
bers in  that  they  show,  not  variations  in  the  prices  of  un- 
varying goods,  but  variations  in  the  prices  of  goods  that 
maintain  their  identity  despite  continual  changes  in  quality. 
This  difference  enhances  the  difficulty  both  of  making  and 
of  using  them;  but  it  does  not  destroy  their  logical  legiti- 
macy or  their  practical  importance.  ... 

The  Uses  of  Stock  Index  Numbers.  —  An  index  numbei' 
is  a  statistical  device  made  to  serve  certain  ends.  Hence 
the  logical  first  step  toward  an  evaluation  of  any  such  series 
is  to  define  precisely  the  end  which  the  finished  results  are 
to  serve.  That  done,  one  has  a  criterion  by  which  to  judge 
the  merits  and  defects  of  the  series  already  in  use,  and  by 
which  to  guide  his  own  efforts  in  making  new  series. 


358  STATISTICAL   METHODS 

The  trouble  with  this  seemingly  promising  lead  is  that 
stock  index  numbers  are  put  to  so  many  and  such  varied 
uses  as  to  give  little  help  in  defining  what  is  wanted.  An 
economist  may  seek  to  measure  changes  in  the  purchas- 
ing power  of  money  over  stocks,  a  speculator  may  wish 
to  forecast  the  probable  future  course  of  the  market,  a 
public  commission  may  be  interested  in  the  terms  on  which 
corporations  can  raise  new  capital,  a  publicist  may  in- 
vestigate the  claim  that  government  regulation  has  brought 
loss  upon  investors,  a  financial  historian  may  wish  to  mark 
off  periods  of  expansion  and  contraction,  a  trustee  may 
inquire  whether  the  fluctuations  of  his  security  holdings 
have  compared  favorably  with  the  average  course  of  the 
market,  an  insurance  company  may  seek  Hght  on  the  prob- 
able future  of  interest  rates,  a  student  may  wish  to  compare 
stock  fluctuations  with  the  price  fluctuations  of  commod- 
ities of  wholesale  or  retail,  of  labor,  of  bonds,  of  farm  lands, 
of  securities  in  other  countries,  etc.  Now,  each  one  of  those 
people  will  have  use  for  a  stock  index.  But  the  more  care- 
fully these  various  uses  are  analyzed  the  clearer  it  becomes 
that  their  requirements  difTer.  The  character  and  the  num- 
ber of  stocks  to  be  included,  the  frequency  of  the  quotations 
needed,  the  period  of  time  covered,  whether  actual  or  rel- 
ative prices  should  be  used,  the  desirability  of  making 
subgroups  and  their  basis,  the  kind  of  average  appropriate, 
the  necessity  of  considering  deviations  from  the  mean, 
whether  weights  should  be  introduced  and  if  so  what  is  the 
proper  criterion  of  "importance"  —  these  and  the  other 
points  of  technique  that  arise  in  making  an  index  num- 
ber would  not  all  be  decided  precisely  alike  in  any  two  of 
the  cases  suggested,  did  uses  strictly  dictate  methods  — 
as  logically  they  should. 

Ideally,  every  distinct  use  should  have  a  distinct  index 


INDEX  NUMBER  MAKING  AND   USING  359 

number  especially  designed  for  it.  Practically,  however, 
cases  are  few  when  the  consumer  of  statistics  has  the  tech- 
nical skill  and  can  spend  the  time  and  money  to  make  a 
series  exactly  answering  his  needs.  What  happens  is  that  he 
uses  for  his  special  purposes  one  of  the  series  published  by 
others  —  more  often  than  not  without  realizing  that  the 
figures  in  question  are  in  certain  respects  ill  adapted  to  his 
needs.  Frequently  the  user  does  not  even  hit  upon  that 
one  among  the  published  series  which  is  least  unsuited  to 
his  case.  And  this  situation  promises  to  change  but  slowly. 
Probably  the  published  series  will  long  continue  to  be  used 
as  "general-purpose"  index  numbers.  And  a  "general- 
purpose"  index  number  is  too  indefinite  a  conception  to 
guide  one  surely  through  the  maze  of  choices  that  are  in- 
volved in  making  a  new  series  or  in  ranking  old  ones. 

Under  these  confusing  circumstances,  what  can  we  at- 
tempt with  any  prospect  of  success?  We  cannot  discuss 
the  merits  of  stock  index  numbers  at  large  with  reference 
to  their  uses,  because  these  uses  and  their  several  require- 
ments are  so  multifarious.  Our  best  hope  seems  to  Ue  in 
reversing  the  problem.  That  is,  we  can  analyze  stock 
index  numbers,  old  and  new,  to  find  out  of  what  materials 
and  by  what  methods  they  are  made.  Then  we  can  dis- 
cuss their  uses  with  reference  to  their  construction.  Fi- 
nally, we  can  determine  what  fluctuations  in  the  prices  of 
stocks  can  be  measured  most  accurately  and  by  what  means. 
The  index  number  which  stands  first  in  this  test  "will  have 
special  claims  to  acceptance,  except  for  uses  which  require 
some  radically  different  series,  less  accurate  though  it  be. 

REVIEW 

1.  Contrast  commodity  and  stock  prices  in  relation  to  index 
number  making  and  using. 


360  STATISTICAL   METHODS 

2.  What  is  Professor  Mitchell's  approach  to  a  study  of  stock 
indexes  ?  In  what  way  does  he  contrast  general-  and  special-purpose 
numbers  ?     In  what  ways  is  his  discussion  paralleled  in  the  Text  t 

Weighting  and  the  Making  of  Stock  Index  Numbers  ^ 

So  long  as  statisticians  expected  but  rough  results  from 
their  index  numbers  of  commodity  prices  at  wholesale,  they 
treated  systematic  weighting  as  a  theoretical  refinement 
in  method  which  made  little  difference  in  the  results.  What 
pleased  them  was  to  find  that  their  simple  and  weighted 
averages  showed  the  same  general  trend.  But  as  experi- 
ence has  demonstrated  that  under  favorable  circumstances 
the  margin  of  uncertainty  in  such  work  may  be  reduced 
to  less  than,  say,  10  per  cent  of  the  results,  makers  of  com- 
modity index  numbers  have  begun  to  regard  proper  weight- 
ing as  practically  important.  Is  it  important  in  also  mak- 
ing index  numbers  of  stock  prices? 

Hitherto  most  stock  indexes  have  been  "simple"  aver- 
ages of  actual  or  relative  prices.  Now  simple  averages 
into  which  no  weights  enter,  or  in  which  all  stocks  have  the 
same  weights,  —  they  are  really  averages  in  which  the  weights 
have  not  been  sj^stematically  planned  but  left  to  chance. 
What  degree  of  influence  any  stock  in  a  given  sample  will 
exercise  upon  the  results  in  a  simple  series  depends  both 
upon  the  original  quotations  and  upon  the  way  in  which 
they  are  worked  up.  For  example,  an  arithmetic  mean 
of  actual  prices  in  effect  assigns  heavy  weights  to  the  stocks 
that  command  high  prices  per  share  and  light  weights  to 
stocks  that  are  cheap.  But  if  these  actual  prices  are  turned 
into  relatives  and  the  arithmetic  means  are  made  from  the 

1  Adapted  with  permission  from  Mitchell,  Wesley  C,  "A  Critique  of 
Index  Numbers  of  the  Prices  of  Stocks"  in  Journal  of  Political  Economy, 
July,  1916,  Vol.  XXIY,  pp.  G84-691. 


INDEX  NUMBER  MAKING  AND  USING  361 

latter  figures,  the  weighting  is  hkely  to  be  revolutionized.  For 
now  the  influence  of  a  given  stock  depends  on  a  radically 
different  factor,  not  on  its  price  in  dollars  and  cents  as 
compared  with  the  prices  of  other  stocks  in  the  sample, 
but  on  the  percentage  which  the  price  on  the  date  in  ques- 
tion bears  to  the  price  of  the  same  stock  in  the  period  chosen 
as  base  as  compared  with  the  corresponding  percentages 
for  the  other  stocks.  A  shift  to  a  new  base  commonly 
alters  the  relative  magnitude  of  these  percentages  and  there- 
fore changes  the  weights  once  more.  Finally,  the  substi- 
tution of  geometric  means  or  medians  for  arithmetic  means 
gives  an  entirely  new  twist  to  the  whole  situation.  In  a 
geometric  mean  the  influence  of  a  stock  depends  upon  the 
comparative  magnitude  of  the  ratios  of  change  which  its 
price  undergoes,  and  it  matters  not  whether  actual  or 
relative  prices  are  used  or  on  what  base  the  relative  prices 
are  computed,  for  none  of  these  matters  affect  the  ratios 
of  change,  which  alone  count.  In  a  median  it  does  make 
a  difference  whether  actual  or  relative  prices  are  averaged 
and  on  what  base  the  relatives  are  computed ;  but  the  in- 
fluence which  any  stock  exercises  upon  the  result  depends 
solely  on  whether  its  actual  or  relative  price  happens  to 
be  at,  above,  or  below  the  middle  of  the  whole  series  after 
the  data  have  been  arranged  in  numerical  order.  The 
magnitude  of  its  deviation  from  the  middle  position  has  no 
effect. 

Since  all  index  numbers  are  really  weighted,  the  only 
question  is  whether  these  weights  should  be  tacit  or  avowed, 
obscure  or  clear,  left  to  chance  or  controlled  on  some  in- 
telligible princii)le.  This  question  is  one  of  great  mo- 
ment, particularly  when  one  is  dealing  with  stocks.  For 
different  schemes  of  systematic  weighting  produce  large  dif- 
ferences  in    results,    when    the    weights    themselves    differ 


362  STATISTICAL   METHODS 

notably.  Different  schemes  of  haphazard  weighting  tacitly 
introduced  by  changing  from  averages  of  actual  prices  to 
averages  of  relatives,  or  by  shifting  the  base  on  which  rela- 
tives are  computed,  cause  wide  divergences.  Finally,  in 
most  cases  the  series  with  systematic  weights  and  the  series 
with  haphazard  weights  differ  from  each  other  at  least  as 
much  as  they  differ  among  themselves.  If  systematic 
weighting  is  desirable  in  making  commodity  indexes  where 
it  leads  to  comparatively  moderate  differences  in  results, 
a  fortiori  it  is  desirable  in  making  stock  indexes  where  the 
differences  produced  in  results  are  much  wider. 

Few  men  would  hestitate  to  say  that  the  price  of  Penn- 
sylvania stock  is  more  important  than  the  price  of  Duluth, 
South  Shore  &  Atlantic  stock  and  deserves  to  have  more 
weight  in  an  index  number.  It  is  more  important  because 
there  is  more  Pennsylvania  stock  in  the  hands  of  investors, 
individual  and  corporate;  because  the  Pennsjdvania  does 
much  the  bigger  business;  because  Pennsylvania  stock  is 
a  more  important  article  of  commerce  —  more  of  it  changes 
hands  year  by  year. 

These  three  reasons  imply  three  different  criteria  of  the 
importance  of  a  given  stock,  criteria  upon  which  may  be 
based  three  sets  of  weights,  each  of  which  is  appropriate 
for  special  ends.  If  the  aim  is  to  show  the  average  changes 
in  the  prices  of  securities  held  by  the  public,  the  amount 
of  stock  outstanding  yields  the  logical  set  of  weights.  If 
the  aim  is  to  throw  light  on  the  changes  in  the  prices  of 
business  enterprises  as  such,  then  gross  earnings,  the  best 
available  gauge  of  volume  of  business  transacted,  may  be 
used  as  weights.  If  the  aim  is  to  find  average  changes 
in  the  prices  of  stocks  that  are  traded  in,  then  the  number 
of  shares  sold  should  be  used.  Other  aims  might  make 
still  other  systems  of  weights  desirable.  .  .  . 


INDEX  NUMBER  MAKING  AND  USING  303 

But  to  what  ought  these  weights  be  appHed  —  to  actual 
prices  or  to  relative  prices  worked  out  on  some  chosen  base? 
That  is  equivalent  to  the  question :  What  weights  ought 
to  be  used  on  the  actual  prices?  For  any  average  of  rela- 
tive prices  is  itself  a  weighted  average  of  actual  ju'ices  in  dis- 
guise. For  example,  index  numbers  made  by  averaging 
relative  prices  on  the  1890-1899  base  are  equivalent  to 
averages  of  actual  prices  weighted  by  the  factors  required 
to  make  the  average  actual  price  of  each  stock  in  that  dec- 
ade equal  100.  .  .  .  Similarly  the  index  numbers  of  rela- 
tive prices  on  the  preceding  year  base  are  averages  of  actual 
prices,  each  weighted  by  the  multiplier,  which  makes  its 
price  in  the  year  before  equal  100. 

In  weighting  relative  prices,  then,  we  are  weighting 
already  weighted  actual  prices.  Upon  the  final  result, 
therefore,  each  stock  will  have  an  influence  proportioned, 
not  to  its  figure  in  the  formal  scale  of  weights,  but  to  this 
figure  combined  with  its  actual-price-times-another-weight. 
Likewise,  in  weighting  actual  prices  themselves  we  give 
each  stock  an  influence  upon  the  result  which  depends 
not  simply  on  the  weight,  but  upon  the  product  of  the 
weight  times  the  price.  .  .  . 

The  first  step  in  weighting,  therefore,  should  be  to  de- 
cide what  proportionate  influence  we  wish  each  stock  to 
exercise  upon  the  final  results.  Of  course  that  depends  upon 
the  end  in  view.  For  example,  in  measuring  the  changes 
in  the  market  value  of  stocks  held  by  investors,  the  im- 
portance of  each  of  our  sample  stocks  depends  both  on  the 
amount  in  the  hands  of  investors  and  on  the  actual  i)rice. 
Weights  based  on  amounts  outstanding  should  therefore 
be  applied  to  actual  prices.  If  foi-  some  other  i)urpose  we 
think  that  the  fluctuations  of  each  stock  should  have  an 
influence  proportionate  simply  to  the  gross  earnings  of  each 


364  STATISTICAL   METHODS 

corporation,  then  we  should  not  apply  weights  based  upon 
earnings  directly  to  actual  prices,  but  should  first  make 
the  average  actual  prices  of  all  the  stocks  the  same  for  the 
period  covered  by  applying  one  set  of  equahzing  weights, 
and  then  multiply  these  equated  prices  by  the  weights  based 
upon  earnings.  In  this  case,  however,  it  would  be  quicker 
to  begin  by  making  the  two  sets  of  weights  into  one,  and 
then  to  multiply  the  actual  prices  of  the  stocks  by  the  con- 
solidated weights. 

REVIEW 

1.  Professor  Mitchell  seems  to  distinguish  between  the  theoretical 
and  practical  aspects  of  weighting.  State  his  distinction,  and  com- 
pare it  with  the  discussion  of  the  same  subject  in  the  Text. 

2.  How  differently  do  weights  operate  in  simple  averages  of 
actual  prices,  and  of  relative  prices?  How  differently  in  the  case 
of  medians ;    in  the  case  of  geometric  means  ? 

3.  Defend  the  writer's  contention  that  "aU  index  numbers  are 
really  weighted." 

4.  What  criteria  of  importance  may  be  used  in  selecting  weights 
for  stock  indexes  ?     Under  what  conditions  should  each  be  selected  ? 

5.  To  what  form  of  the  price  data  ought  weights  to  be  applied  ? 
Why  is  this  an  important  question  ? 

Conclusions  on  the  Making  of  Stock  Index 
Numbers  ^ 

The  choice  of  methods  in  making  an  index  number  of 
stocks  should  be  guided  by  the  specific  purpose  in  view. 
It  follows  that  the  index  number  that  is  best  for  any  pur- 
pose depends  upon  the  specific  phase  of  price  fluctuations 
which  that  purpose  requires  to  be  measured. 

1  Adapted  with  permission  from  Mitchell,  Wesley  C,  "A  Critique  of 
Index  Numbers  of  the  Prices  of  Stocks"  in  Journal  of  Political  Economy, 
July,  1916,  Vol.  XXIV  at  pp.  691-693. 


INDEX  NUMBER  MAKING  AND  USING  365 

Strictly  interpreted,  this  obvious  but  often-neglected 
rule  bars  out  the  question :  What  is  the  best  index  num- 
ber at  large?  Perhaps  there  is  no  single  series  that  is  not 
"the  best"  for  some  imaginable  use.  But,  by  way  of  con- 
clusion, we  may  point  out  what  fluctuations  in  the  prices 
of  stocks  can  be  measured  with  the  narrowest  margin  of 
error,  and  argue  that  the  index  number  which  best  repre- 
sents these  most  measurable  fluctuations  is  the  best  "gen- 
eral-purpose" series;  the  index  number  to  be  recommended 
for  use  by  the  general  reader,  and  by  the  specialist  also, 
when  his  particular  aim  does  not  definitely  demand  some 
differently  constructed  series,  in  spite  of  its  inferior  accuracy. 

Along  this  line  a  confident  opinion  can  be  given.  Geo- 
metric means  of  the  ratios  of  change  in  quotations  within 
brief  periods,  such  as  from  one  year  to  the  next,  have  been 
shown  to  be  the  most  accurate  measures  of  fluctuations  in 
the  prices  of  stocks.  .  .  . 

For  measuring  fluctuations  covering  longer  periods  of 
time  geometric  means  are  again  the  most  representative 
averages.  But  the  farther  apart  grow  the  years  between 
which  price  comparisons  are  made  the  less  accurate  grow 
the  results  obtained  from  a  given  body  of  quotations  and 
the  smaller  grows  the  list  of  stocks  for  which  continuous 
series  of  quotations  can  be  had.  It  is  true  that  the  suc- 
cessive percentages  of  change  in  price  from  one  year  to  the 
next  can  be  multiplied  into  each  other  to  make  a  continuous 
"chain  index;  "  but,  while  each  link  has  a  narrow  margin 
of  error,  the  errors  are  cumulative,  so  that  a  compari- 
son between  the  two  ends  of  the  chain  becomes  less  trust- 
worthy the  longer  the  chain  is  made.  Of  course  the  same 
difficulty  inheres  in  the  relative  prices  on  a  fixed  base  that 
may  be  made  from  the  geometric  means  of  actual  prices. 
No  refinement  of  methods  can  mend  the  fundamental  defect 


366  STATISTICAL   METHODS 

of  the  data.  The  ratios  of  change  in  stock  prices  between 
years  far  apart  are  so  widely  and  irregularly  scattered  that 
no  average  made  from  them  can  have  a  high  representative 
value. 

The  best  way  to  diminish,  since  we  cannot  remove,  this 
difficulty  is  to  break  the  long  periods  up  into  parts,  to  com- 
pute fresh  index  numbers  for  each  part,  and  to  string  these 
index  numbers  together.  The  advantages  of  this  shift  are 
(1)  that  a  larger  "sample"  of  stocks  with  continuous  quo- 
tations can  be  had  for  short  periods,  and  (2)  that  the  fixed- 
base  relatives  will  show  a  less  irregular  distribution.  Pushed 
to  extremes,  this  course  would  lead  to  the  making  of  a 
geometric-mean  index  number  of  all  stocks  quoted  both  in 
1890  and  in  1891,  and  of  a  second  index  number  of  all  stocks 
quoted  both  in  1891  and  in  1892,  and  so  on  to  date.  The 
main  defect  of  such  a  series,  after  the  yearly  percentages 
had  been  linked  together  in  a  chain  index,  would  be  that  no 
one  could  be  sure  what  part  of  the  fluctuations  shown  was 
due  to  change  in  prices  and  what  part  to  changes  in  the 
stocks  quoted.  Hence  price  comparisons  between  1890 
and  1915  would  still  be  dubious.  Perhaps  a  middle  course 
is  the  least  objectionable  :  Make  a  new  index  number  from  a 
new  sample  of  stocks  every  ten  or  twenty  years,  using  geo- 
metric means ;  each  time  that  a  new  series  is  made  compute 
overlapping  figures  for  a  few  years  both  from  the  old  and 
from  the  new  samples:  find  what  part  of  the  changes  in 
those  years  is  due  to  alterations  in  the  list  of  stocks,  and, 
finally,  allow  for  these  differences  as  well  as  may  be  in  join- 
ing the  two  index  numbers  together.  The  price  compari- 
sons that  could  be  extended  in  this  way  over  long  periods 
of  time  would  not  indeed  possess  the  accuracy  of  our  year- 
to-year  figures,  but  they  would  be  more  trustworthy  than 
any  of  the  fixed-base  series. 


INDEX  NUMBER  MAKING  AND  USING         367 

REVIEW 

1.  State  Professor  Mitchell's  general  conclusion. 

2.  Would  his  conclusion  apply  to  commodity  wholesale  prices ; 
to  commodity  retail  prices  designed  to  measure  changes  in  the 
"cost  of  living"?  Why?  (Answer  these  questions  in  the  light  of 
both  the  Text  discussion  and  the  above  article.) 

REVIEW   PROBLEMS 

Index  Numbers  and  Averages 

1.  Change  of  Base  and  Use  of  the  Arithmetic  Mean.  Average 
of  Relatives.      (See  Text,  page  318  and  note  2.) 

Using  the  absolute  price  data  on  page  290  of  the  Text  recompute 
a  simple  average  of  relatives  price  index  number  for  each  of  the  years 
with  1914  as  the  base.  Compare  the  numbers  as  thus  determined 
with  those  for  1912  and  1913  obtained  by  dividing  the  indexes  for 
these  years,  as  given  on  page  296,  by  the  1914  number. 

What  conclusions  do  you  draw  from  this  experiment  relative  to 
the  methods  of  base  shifting  when  dealing  with  an  average  of  rela- 
tives index  number?  In  what  respects  are  the  contents  of  note  2 
on  page  318  borne  out? 

2.  Comparison  of  a  Simple  and  of  a  Weighted  Index  Number 
Series. 

Using  the  price  data  on  page  290  of  the  Text,  compute  weighted 
average  of  relatives  index  numbers  for  1912,  1913,  1914.  Compare 
the  weighted  with  the  simple  numbers.  Arrange  the  data  in  the 
form  of  a  table,  properly  label  it  and  give  it  a  correct  title. 

Use  the  following  weights  : 

Item  Weight 

Total .353 

Corn "   .  85 

Cotton 33 

Oats 15 

Hay 1 

Hides 6 

Cattle 90 

Hogs 123 


368  STATISTICAL   METHODS 

3.  Use  of  Median  Index  Numbers  in  Simple  and  Weighted  Series. 
By  using  the  commodities  and  prices  on  page  296  of  the  Text, 

compute  a  median  of  relatives  index  number  for  each  of  the  years 
1912,  1913,  and  1914.  Compare  these  with  medians  obtained  when 
the  following  weights  are  used  : 

Commodities  Weights 

Total 353 

Corn 85 

Cotton 33 

Oats 15 

Hay 1 

Hides 6 

Cattle 90 

Hogs 123 

(1)  What  effect  do  the  weights  seem  to  have? 

(2)  Would  this  be  true  if  another  system  of  weights  were  used? 

(3)  Would  this  be  true  if  the  order  of  the  weights  assigned  to 
hay,  oats,  corn,  and  cattle  were  changed  ?  If  the  remaining  weights 
were  concentrated  on  the  commodity  cotton?  Would  the  degree 
of  concentration  be  significant  ?     Why  ? 

(4)  Which  of  the  above  questions  would  you  answer  differently 
in  ease  the  arithmetic  mean  were  used?     In  what  way? 

(5)  Arrange  your  data  in  the  form  of  a  table,  properly  label  it, 
and  give  it  a  correct  title. 

4.  Base  Shifting  and  the  Use  of  the  Median.     {Text  page  322.) 
Using  the  unweighted  medians  of  relative  prices  determined  in 

Problem  3,  shift  the  base  to  1914  by  dividing  through  by  the  1914 
number.  Compare  these  results  with  those  obtained  by  recomput- 
ing throughout  the  relatives  on  the  1914  base.  (See  data  on  Prob- 
lem 1.) 

In  what  ways  do  your  results  bear  out  the  contention  of  the  Text, 
p.  322,  relative  to  the  use  of  median  in  Index  Numbers  ?      Be  specific. 


CHAPTER   IX 

DESCRIPTION  AND   SUMMARIZATION  — DISPERSION 

AND  SKEWNESS 

The  Nature  of  Statistical  Knowledge  ^ 

A  careful  consideration  of  the  history  of  statistical 
science  leads  to  the  conclusion  that  statistical  methods 
are  used  for  two  sorts  of  purposes,  or  to  gain  two  sorts  of 
loiowledge  about  events  or  things. 

A.  On  the  one  hand  the  statistical  method  finds  one  of 
its  chief  uses  in  furnishing  a  method  (and  the  only  one 
known  in  science)  of  describing  a  group  in  terms  of  the 
group's  attributes,  rather  than  in  terms  of  the  attributes 
of  the  individuals  which  compose  the  group.  .  .  . 

What  sorts  of  positive,  definite,  and  exact  knowledge 
do  statistics  give  us? 

1.  Precise  knowledge  of  the  composition  of  groups  or 
masses.  This  is  the  knowledge  gained  by  counting.  Sup- 
pose we  find  a  basket  containing  a  number  of  balls  of  sev- 
eral different  colors,  and  proceed  to  count  them  with  the 
following  results : 

7  Reds 

9  White 

2  Black 

1  Green 

1  Adapted  with  permission  from  Pearl,  Raymond,  Modes  of  Rescnrch  in 
Genetics,  Macmillan,  1915,  pp.  79-100. 
2  B  369 


370  STATISTICAL   METHODS 

Such  a  count  furnishes  us  at  once  with  a  great  deal  of 
perfectly  definite  and  precise  information  about  this  group 
or  population  of  balls.  For  example,  the  count  tells  us 
that  it  will  never  be  possible  to  draw  more  than  one  pair 
of  balls  of  which  one  member  is  green.  This  is  a  definite 
attribute  of  this  population  which  may  be  used  to  differ- 
entiate it  from  other  populations.  In  this  particular  popu- 
lation only  one  green  ball  occurs. 

This  sort  of  knowledge  derived  by  counting  is  perfectly 
definite  and  precise  so  far  as  it  relates  to  the  particular  group 
or  mass  which  it  concerns  in  any  particular  case.  It  does 
not  involve  any  approximation,  or  probability,  and  is  as 
precise  as  knowledge  of  the  individual.  It,  however,  per- 
tains to  the  group.  It  forms  a  part  of  a  proper  scientific 
description  of  a  group. 

2.  Knowledge  of  certain  abstract  qualities  of  groups  or 
masses.  This  knowledge  is  obtained  by  calculation  from 
the  counted  data.  The  more  important  of  the  abstract 
qualities  of  groups  are : 

a.  The  center  or  typical  condition  of  the  group ;  or  the 
condition  about  which  the  individuals  composing  the  group 
cluster.  This  is  variously  measured :  by  the  arithmetic 
mean,  which  gives  the  center  of  gravity  of  the  group,  by 
the  median,  which  tells  the  point  on  either  side  of  which 
exactly  half  the  individuals  fall,  by  the  mode,  which  tells  the 
point  of  greatest  frequency  of  occurrence  in  the  group,  etc. 

b.  The  degree  of  individual  diversitij  comprised  in  the 
group.  This  attribute,  called  the  variability  of  the  group, 
is  again  variously  measured :  by  standard  deviations,  co- 
efficients of  variation,  etc. 

c.  The  degree  of  symmetry  of  the  distribution  of  the  indi- 
viduals composing  the  group.  This  is  measured  by  the 
skewness  or  other  related  constants.  .  .  . 


DESCRIPTION   AND   SUMMARIZATION  371 

One  point  here  we  must  be  quite  clear  about.  This  is 
that  the  kind  of  knowledge  discussed  under  this  heading  2 
is  just  as  definite  and  precise,  and  involves  as  Uttle  ap- 
proximation and  ind(^terminisrn,  as  does  any  piece  of  indi- 
vidualistic knowledge,  so  long  os  loe  confine  our  attention 
solely  to  the  particular  group  discussed  in  a  particular  single 
case.  We  are  accustomed  to  stating  means,  for  example, 
with  probable  errors.  But  this  is  only  because  it  is  proposed 
to  extend  the  conclusions  beyond  or  outside  of  the  partic- 
ular group  and  the  particular  instance  for  which  the  mean 
was  calculated.  For  that  group  and  that  instance  the  mean 
is  perfectly  exact  and  precise  to  that  degree  of  precision 
denoted  by  the  unit  of  measure  used,  assuming  that  no 
arithmetical  mistakes  have  been  made  in  its  computation. 
Thus  suppose  one  measures  the  stature  of  three  men  to  the 
nearest  inch,  and  then  calculates  the  average.  The  result 
is,  without  any  probable  error,  the  average  height,  at  the 
particular  moment  when  they  were  measured,  of  those  three 
men  exact  to  the  unit  of  measurement  used.  It  describes 
and  measures  precisely  an  attribute  of  those  men  con- 
sidered as  a  group.  But  if  we  were  to  consider  this  result 
from  the  viewpoint  of  whether  it  gave  a  reasonable  meas- 
ure of  the  average  height  of  men  in  general,  or  from  the 
viewpoint  of  whether  it  gave  a  proper  value  for  the  mean 
height  of  these  men  when  repeatedly  measured  under  vary- 
ing conditions,  it  would  clearly  be  subject  to  a  large  prob- 
able error.  It  would,  in  point  of  fact,  have  lost  its  char- 
acter of  precise  and  definite  knowledge,  and  have  become  a 
more  or  less  poor  approximation. 

3.  Precise  knowledge  of  the  degree  of  association  or  con- 
tingency between  different  events  or  characters  within  a 
group.  This  is  furnished  by  the  method  of  coi-ielation 
in  one  or  another  of  its  various  forms.     By  this  general 


372  STATISTICAL  METHODS 

method  we  are  able  to  measure  precisely  the  degree  of  re- 
semblance between  the  individuals  composing  a  group 
in  respect  to  one  or  more  characters.  So  long  as  attention 
is  confined  to  the  particular  gi'oup  on  which  the  meas- 
urement is  made,  and  to  that  group  alone,  and  to  a  single 
instance  (in  time)  the  knowledge  gained  is  precise.  It  is 
a  part  of  the  description  of  the  attributes  of  that  group. 
When  we  pass  from  that  particular  group  to  other  groups, 
or  individuals,  our  results  are  no  longer  precise,  but  in- 
ferential, and  the  probable  errors  tell  us  something  about 
the  degree  to  which  the  inference  is  trustworthy. 

Summarizing  the  results  of  the  above  analysis,  we  see 
that  the  statistical  method  can 

1.  Furnish  precise  descriptive  knowledge  about  groups. 
This  knowledge  is  of  various  sorts.  It  is  definite  and  pre- 
cise so  long  as  attention  is  confined  solely  to  the  particular 
group  and  the  particular  instance  on  which  it  is  based. 

2.  The  knowledge  gained  by  the  statistical  method,  as 
we  have  analj^zed  it  above,  precise  though  it  may  be,  per- 
tains to  the  group  and  not  to  the  individual.  It  is  exact 
knowledge  about  the  composition,  or  attributes,  or  con- 
tingencies of  masses  or  groups. 

3.  This  ability  to  describe  groups  in  terms  of  the  groups' 
own  attributes,  which  is  an  unique  property  of  the  statis- 
tical method,  is  extremely  useful  in  the  practical  conduct 
of  scientific  investigations.  It  makes  the  statistical  method 
an  absolutely  essential  adjunct  to  every  other  scientific 
method,  and  particularly  to  the  experimental.  This  fact 
is  just  now  beginning  to  be  recognized  by  some  experi- 
mentalists and  hailed  as  a  rather  original  thought.  It  is 
not  new. 

B.  We  may  turn  now  to  a  wholly  different  aspect  of  the 
statistical  method,  wherein  it  is  used  for  the  purpose  of 


DESCRIPTION  AND  SUMMARIZATION  373 

predicting  or  estimating  tlie  probable  or  the  approximate 
condition  in  the  individual  from  a  statistical  examination 
of  the  condition  in  the  mass  or  the  group.  Resort  is  had 
to  the  statistical  method  for  this  purpose  primarily  in  those 
cases  where  the  outcome  of  the  event,  or  the  condition  of 
the  thing,  is  determined  by  the  combined  action  of  a  large 
number  of  small  causes,  each  about  equally  influential 
upon  the  final  result. 

Originally  the  statistical  method  was  only  employed  for 
this  second  purpose  in  cases  where,  because  of  the  multi- 
plicity of  the  cause  groups  involved  in  the  determination 
of  the  event,  and  the  consequently  small  effect  of  each,  it 
was  impossible  to  make  any  reasonable  prediction  regard- 
ing an  individual  from  an  examination  of  that  indi\ddual 
alone.  Such  employment  might  be  considered  legitimate, 
though  not  very  fruitful,  on  the  ground  that  prediction 
so  made,  uncertain  and  doubtful  as  it  may  be,  is  after  all 
perhaps  better  than  no  prediction  at  all.  As  time  has  gone 
on,  however,  there  has  been  an  increasing  tendency  to  as- 
sume that  this  use  of  the  statistical  method  had  general 
a  priori  validity  and  could  be  profitably  employed  in  all 
sorts  of  cases.  This  point  of  view  reaches,  it  seems  to  me, 
its  hmit  in  the  following  sentence  from  Royce.  "There 
is,  therefore,  good  reason  to  say  that  not  the  mechanical 
but  the  statistical  form  is  the  canonical  form  of  scientific 
theory,  and  that  if  we  knew  the  natural  world  millions 
of  times  more  widely  and  minutely  than  we  do,  the  mor- 
tality tables  and  the  computations  based  upon  a  knowl- 
edge of  averages  would  express  our  scientific  knowledge 
about  individual  events  much  better  than  the  nautical 
almanac  would  do." 

This  leads  us  to  consider  carefully  the  general  question 
of  the  validity  on  the  one  hand,  and  the  usefulness  on  the 


374  STATISTICAL   METHODS 

other  hand,  of  this  whole  second  mode  of  employment  of 
the  statistical  method.  It  is  the  one  which  has  attracted 
the  greatest  attention  because  of  its  essentially  spectacular 
nature  coupled  with  a  sort  of  mysteriousness  bordering 
upon  the  miraculous.  It  seems  a  wonderful,  indeed  almost  a 
superhuman,  accomphshment  to  be  able  to  say  in  the  manner 
of  the  oracles  of  old,  "So  many  men  will  commit  suicide 
next  year." 

Since  Clerk-Maxwell  introduced  statistical  modes  of  rea- 
soning into  phj^sical  science  there  has  been  an  ever  in- 
creasing tendency  to  regard  the  universe  as  organized  on  a 
statistical  plan.  This  has  come  to  carry  with  it  two  im- 
plications, one  of  which  is  quite  fallacious  and  the  other 
partly  so. 

The  first  of  these  is  that  the  individual  events,  of  which 
all  the  causes  are  not  precisely  known  to  us,  are  indetermi- 
nate. Such  an  assumption  is  of  course  unwarranted.  Be- 
cause we  do  not  know  all  the  causes  leading  to  a  particular 
event  does  not  mean  that  that  event  is  any  the  less  pre- 
cisely determined  by  the  course  of  antecedent  events.  Con- 
sider a  box  containing  100  consecutively  numbered  cards. 
Suppose  one  card  were  to  be  drawn  and  that  it  bore  the 
number  36.  It  would  be  quite  impossible  to  formulate 
precisely  all  the  causes  which  led  to  the  drawing  of  the 
number  36  on  the  particular  occasion  considered,  but  it  is 
equally  impossible  to  conceive  that  this  result  was  not  de- 
finitely "caused."  In  other  words,  there  clearly  was  a 
whole  train  of  antecedent  circumstances,  which  taken  all 
together  definitely  resulted,  and  could  only  have  resulted, 
in  the  drawing  of  the  number  36.  The  too  prevalent  con^ 
elusion  that  the  application  of  the  statistical  method  or 
statistical  modes  of  thought  implies  phenomenal  indeter- 
minism  in  the  individual  case  is  totally  fallacious. 


DESCRIPTION  AND  SUMMARIZATION  375 

The  second  currently  accepted  implication  of  a  sta- 
tistical view  of  the  universe  is  that  in  general  a  particular 
event  or  phenomenon  is  the  outcome  of  the  combined  ac- 
tion of  a  great  number  of  causes,  each  of  which  alone  pro- 
duced but  a  small  part  of  the  final  total  effect.  There  is 
clearly  so  much  truth  in  this  point  of  view  as  is  included 
in  the  fact  that  individual  events  or  phenomena  do,  in  some 
degree  or  other,  vary,  and  further  these  variations  in  gen- 
eral distribute  themselves  more  or  less  in  accord  with  the 
well-known  laws  of  errors.  But  the  assertion  that  events 
are  individually  the  outcome  of  the  action  of  great  num- 
bers of  causes,  each  of  which  had  a  small  part  and  a  part 
significantly  equal  to  that  played  by  every  other  one  of 
the  causes  concerned  in  the  final  result,  is  only  true  if  the 
"universe  of  discourse"  is  indefinitely  extended  in  time. 
But  practically  science  works  in  a  definitely  and  rather 
narrowly  limited  universe  of  discourse  so  far  as  concerns 
time.  One  of  the  causes  for  the  writing  of  these  lines  is 
that  a  certain  worthy  was  not  shipwrecked  in  voyaging  to 
this  country  nearly  300  years  ago,  since  if  he  had  been  ship- 
wrecked presumably  I  should  not  exist  and  therefore  could 
not  write  these  words.  But  practically  this  cause  had 
very  little  to  do  with  determining  that  I,  being  here  in 
existence,  should  write  this  book  rather  than  do  various 
other  things  which  I  might  have  done  instead.  It  un- 
doubtedly is  true  that  a  vast  number  of  small  causes  do  play 
a  part  in  the  determination  of  any  particular  event.  But, 
in  many  of  the  events,  at  least,  in  which  science  is  inter- 
ested, these  multitudinous  minor  causes  do  not  play  any 
significant  part  in  the  differential  determination  of  a  par- 
ticular event  at  a  particular  instant  of  time.  There  is  in 
connection  with  the  causation  of  most  events  some  one  or 
two,  or  at  most  a  very  few,  outstanding  cause  groups  which, 


376  STATISTICAL   METHODS 

for  all  practical  purposes,  at  a  given  moment  completely 
determine  their  occurrence.  The  total  effect  of  all  the  vast 
number  of  other  minor  causes  concerned  in  the  remote  past 
is  so  minute,  as  compared  with  the  part  played  by  the  really 
determinative  ones  at  the  moment,  as  to  be  negligible. 
In  other  words,  all  natural  cause  groups  are  not  small,  nor 
of  equal  (balanced)  values  in  the  final  determination  of  the 
event  to  which  they  relate,  provided  we  confine  ourselves 
to  the  time  hmits  of  finite  practical  operations.  .  .  . 

The  fact  that  all  natural  causes  or  cause  groups  are  not 
equally  significant  quantitatively  is,  of  course,  what  makes 
the  experimental  method  fruitful  —  one  might  even  say  pos- 
sible —  in  science.  The  very  essence  of  the  experimental 
method  is  that  the  conditions  for  the  happening  of  an  event 
are  so  arranged  that  the  influence  of  one  putative  causal 
factor  may  be  tested  at  a  time.  If  with  a  radical  change 
in  this  one  factor,  whilst  all  others  remain,  so  far  as  may 
be,  constant,  no  change  in  the  happening  of  the  event  is 
observed,  the  experiment  has  shown  that  this  particular 
factor  has  no  significant  causal  relation  to  the  happening  of 
the  event.  If  a  marked  change  in  the  happening  of  the 
event  is  observed  always  to  follow  the  change  of  condi- 
tions of  operation  of  the  factor  under  investigation,  then 
clearly  this  factor  plays  a  determinative  part.  In  other 
words,  it  is  a  fundamental  logical  prerequisite  of  the  ex- 
perimental method  if  it  is  to  be  successful  (that  is,  con- 
tribute to  knowledge)  that  it  operate  in  a  universe  in  which 
all  causal  factors  are  not  of  equal  quantitative  significance 
at  any  given  instant  of  time. 

Clearly  experimental  analysis  of  this  sort  would  have 
quickly  discovered,  if  the  common  sense  of  men  had  not 
long  previously  shown,  that  the  course  which  a  particular 
event  is  going  to  take  is  not  immediately  the  result  of  the 


DESCRIPTION  AND  SUMMARIZATION  377 

action  of  an  indefinitely  large  number  of  individually  in- 
significant causal  factors,  but  that  it  is  the  outcome  of  the 
action  of  a  few  immediately  determinative  factors  and  the 
effect  of  the  indefinitely  large  number  of  historically  ante- 
cedent small  causes  is  insignificant  in  the  sense  of  being  dif- 
ferential. Generalized,  the  point  may  be  put  in  this  way: 
an  event  A  is  about  to  happen.  It  may  happen  in  any  one 
of  n  different  ways,  each  one  of  which  ways  may  be  desig- 
nated by  a  letter,  I,  p,  r,  t,  etc.  Now  an  indefinitely  large 
number  of  causes  are  concerned  in  bringing  it  about  that 
the  event  A  is  going  to  happen,  and  that  it  can  equally 
well  happen  either  as  I,  p,  r,  t,  etc.  In  other  words,  the 
setting  of  the  stage  for  the  event  has  involved  a  vast  num- 
ber of  small  and  balanced  causes.  But  the  causes  which 
are  differential  in  the  particular  case,  that  is,  which  deter- 
mine that  A  shall  happen  in  the  p  way  this  particular  time, 
and  not  in  the  Z,  the  t,  or  any  other  way,  are,  in  general : 

1.  Few  in  number. 

2.  Immediate  in  time. 

3.  Large  in  relative  quantitative  effect. 

The  point  under  discussion  may  perhaps  be  made  plainer 
by  a  homely  illustration.  Suppose  a  man  steps  up  behind 
a  mule  and  prods  the  creature  with  his  walking  stick.  The 
human  intellect  is  unequal  to  the  task  of  predicting  exactly, 
in  the  particular  case,  what  precise  portion  of  the  man's 
body  the  mule's  hoof  will  land  upon.  A  multitude  of 
minor  causes  will  affect  this  :  The  relative  height  of  the  man 
and  the  mule,  the  age  of  each,  the  place  poked  with  the 
walking  stick,  the  degree  of  fatigue  of  the  mule,  the  tem- 
perature, the  season  of  the  year,  and  countless  other  things 
have  an  influence  in  determining  just  the  precise  spot  where 
the  mule's  foot  and  the  man's  body  come*  together.     These 


378  STATISTICAL   METHODS 

could  be  investigated  statistically  and  tables  drawn  up 
from  which  one  could  predict  the  part  of  the  man  which 
would  most  probably  receive  the  hoof.  But  what  a  silly, 
futile  piece  of  business  this  all  would  be,  since  clearly  the  in- 
fluence of  all  of  these  small  causes  on  what  happens  to  the 
man  is  stupendously  overshadowed  by  the  results  of  two 
factors ;  namely,  putting  himself  behind  a  mule  and  prod- 
ding the  animal  with  a  stick.  Of  course,  a  vast  number  of 
antecedent  causes  are  involved  in  the  setting  of  the  stage, 
but  these  are  not  differential  in  the  determination  of  the 
end  event  of  the  series. 

The  preceding  illustration  has  nothing  directly  to  do  with 
science,  but  the  essential  point  involved  operates  in  the  use 
of  the  statistical  method  as  a  weapon  of  scientific  research. 
This  method  being,  as  we  have  seen  elsewhere,  only  a  de- 
scriptive method,  it  cannot,  any  more  than  any  other  de- 
scriptive method,  tell  us  anything  directly  about  the  causes 
involved*  in  the  determination  of  any  events  or  phenomena 
under  consideration.  It  may  be  of  great  aid,  in  combina- 
tion with  the  experimental  method,  in  helping  to  arrive 
at  such  knowledge,  but  alone  and  of  itself  it  cannot  di- 
rectly furnish  knowledge  of  causes  of  individual  events. 
Yet  the  statistical  method,  particularly  in  that  phase  of 
it  which  we  have  here  under  discussion,  which  essays  to 
predict  the  probable  condition  of  the  individual  from  the 
knowledge  of  the  mass,  seems  to  furnish  information  about 
causes.  It  wears  a  specious  air  of  bringing  a  kind  of  knowl- 
edge which  in  reality  it  not  only  never  does,  but  from  the 
very  nature  of  the  case  never  can  furnish. 

Let  us  consider  now  a  little  more  in  detail  the  nature  of 
the  prediction  of  the  probable  condition  of  the  individual 
from  a  knowledge  of  the  tnass  or  group.  It  has  been  shown 
in  an  earher  section  that  statistics  give  perfectly  definite 


DESCRIPTION  AND   SUMMARIZATION  379 

and  precise,  and  often  very  useful,  knowledge  about  masses 
or  groups.  We  are  now,  however,  not  concerned  with  this 
as  group  knowledge,  but  rather  with  one  use  to  which  such 
knowledge  has  been  put.  This  use  is  that  which  is  com- 
prised in  the  subject  of  statistical  probabilities,  and  which 
involves  the  drawing  of  conclusions  as  to  the  probable  con- 
dition of  the  individual,  based  on  an  exact  knowledge  of  the 
mass. 

In  order  to  approach  the  subject  in  the  simplest  way- 
let  us  consider  a  concrete  case.  Suppose  a  problem  of  the 
following  sort  were  to  be  set  before  us  for  answer :  What  is 
the  probability  that,  at  some  chosen  moment  of  time,  the 
next  birth  to  occur  in,  let  us  say,  the  city  of  Baltimore,  will 
be  of  a  white  child.  Now  if  we  look  at  this  as  a  question  in 
statistical  probability  the  appropriate  way,  of  course,  to  go 
about  solving  it  is  to  turn  up  the  registration  reports  for 
the  city  of  Baltimore  covering  a  period  of  years,  and  find 
out  what  is  the  proportion  of  white  to  colored  births  in  that 
city.  Then,  by  the  simplest  theorem  in  the  calculus  of 
chance,  the  probability  that  the  next  birth  will  be  of  a 
white  child  will  be  given  by  a  fraction  of  which  the  numer- 
ator is  the  number  of  white  children  born  in  Baltimore  and 
the  denominator  is  the  total  number  of  children  born 
in  Baltimore,  both  figures  including  the  same  period  of  time. 
The  difference  between  the  fraction  so  obtained  and  1  will 
be  the  probability  that  the  next  birth  will  be  of  a  child  not 
white;  that  is,  colored.  When  we  have  obtained  such  a 
fraction  we  have  a  definite  piece  of  statistical  knowledge, 
but  of  just  what  use  is  it  so  far  as  concerns  the  individual 
case?  It  implies  no  biological  knowledge  of  any  kind; 
no  knowledge  of  the  laws  of  heredity.  It  really  adds  es- 
sentially, it  seems  to  me,  to  the  sum  total  of  the  world's 
knowledge  only  one  thing.     That  thing  is  the  proper  bet- 


380  STATISTICAL   METHODS 

ting  odds  on  what  the  color  of  the  next  child  born  in  the 
city  will  be.  This  knowledge  would  really  be  useful,  in  a 
pragmatic  sense,  only  provided  some  one  wishes  to  gamble 
upon  that  event. 

Of  course  the  statistical  count,  on  which  the  probabihty 
is  based,  in  itself  furnishes  definite  and  precise  informa- 
tion about  the  population  of  Baltimore,  as  a  population. 
This  may  be  useful.  What  we  are  now  considering,  though, 
is  knowledge  about  individual  cases. 

Let  us  see  what  a  totally  different  kind  of  abihty  to  pre- 
dict the  future  event  in  an  individual  case  is  gained  when 
we  take  into  account  one  single  biological  fact  of  an  in- 
dividualistic instead  of  a  statistical  character.  Suppose, 
that  is  to  say,  that  we  are  informed  that  the  mother  of  the 
next  baby  to  be  born  in  Baltimore  is  black.  It  needs  no 
argument  to  show  how  much  more  precise  is  our  prediction 
as  to  the  color  of  the  next  baby  under  these  conditions. 

This  illustration  brings  out  clearly  the  difference  be- 
tween the  two  possible  bases  for  the  prediction  of  a  future 
event.  On  the  one  hand,  such  prediction  may  be  based 
on  statistical  ratios.  This  means  merely  a  count  of  an  in- 
definitely large  past  experience  regarding  the  occurrence 
or  failure  of  the  event,  but  in  no  way  takes  into  account  the 
causes  which  underlie  the  happening  of  the  event  in  any 
particular  case.  On  the  other  hand,  we  have  the  predic- 
tion which  is  based  on  a  definite  knowledge  of  the  deter- 
minative causes  which  bring  about  the  happening  of  a 
particular  individual  event  of  the  sort  in  which  we  are 
interested  and  about  which  we  are  to  predict.  There  can 
be,  it  would  seem,  no  comparison  between  the  usefulness, 
in  the  pragmatic  sense,  of  these  two  kinds  of  knowledge. 
The  statistical  knowledge  on  which  a  statistical  predic- 
tion is  made  is  essentially  the  most  sterile  kind  of  knowledge 


DESCRIPTION  AND   SUMMARIZATION  381 

that  one  can  possibly  have  so  far  as  concerns  the  individual 
event.  It  merely  gives  one  the  betting  odds  for  or  against 
the  occurrence  of  an  event,  and  absolutely  nothing  more. 
Now  a  wager,  however  large,  in  the  scientific  sense  neither 
discovers,  expounds,  nor  is  a  criterion  of  the  truth.  Bets, 
in  other  words,  are  not  evidence,  though  the  statistician 
sometimes  seems  to  forget  this,  and  to  deal  with  statisti- 
cal ratios  as  though  they  had  probative  worth  in  regard  to 
phenomena. 

On  the  other  hand,  a  prediction  based  on  experimentally 
acquired  knowledge  of  the  determinative  cause  of  the  in- 
dividual event  brings  with  it  a  real  knowledge  of  a  natural 
phenomenon.  The  predictions  so  made  may  not  always 
turn  out  correct,  but  when  they  do  not,  it  incites  us  to  in- 
vestigate the  particular  disturbing  factor  which  under 
such  circumstances  may  overwhehn  the  normally  determina- 
tive cause  of  a  particular  event. 

...  If,  as  has  been  suggested,  that  part  of  the  statis- 
tical method  which  uses  the  calculus  of  probability  as  a 
basis  for  the  prediction  of  future  events  gives  only  a  knowl- 
edge of  betting  odds,  one  may  ask :  what  about  the  whole 
concept  of  probable  error?  The  value  of  this  concept  in 
scientific  research  is  unquestioned.  Yet  plainly  the  whole 
concept  has  its  basis  in  the  calculus  of  probabihty.  Has  not 
our  discussion  led  us  unwittingly  into  a  serious  contradiction  ? 

I  think  not.  Let  us  examine  the  probable  error  con- 
cept a  Httle  more  carefully  than  we  have  yet  done.  Sup- 
pose we  read  that  the  mean  length  of  the  thorax  of  a  thou- 
sand fiddler  crabs  is  30. 14 ±.02  mm.  Just  what  does 
this  actually  mean?  Accepting  the  figures  at  their  face 
value,  or,  put  another  way,  assuming  that  the  mathemat- 
ical theory  on  which  the  probable  error  was  calculated  was 
the  correct  one,  the  figures  mean  something  like  this :    If 


382  STATISTICAL   METHODS 

one  were  to  take,  quite  at  random,  successive  samples  of 
1000  each  from  the  total  population  of  fiddler  crabs  and 
determine  the  mean  thoracic  length  from  each  sample,  these 
means  would  all  be  different  from  each  other  by  varying 
amounts.  In  other  words,  no  single  sample  would  give 
us  the  absolutely  true  value  of  the  mean  thoracic  length 
of  the  whole  fiddler  crab  popidatio7i.  This  true  value  is 
in  an  absolute  sense  unknowable,  because,  for  one  reason, 
always  we  must  come  at  the  finding  of  it  by  the  way  of 
random  sampling,  and  sampling  means  variation.  Now 
it  is  an  observed  fact  of  experience  that  the  variations  due 
to  random  sampHng  distribute  themselves  according  to  a 
definite  law  of  mathematical  probability.  Knowing  this 
law,  it  is  clearly  possible  to  state  the  mathematical  prob- 
ability for  (or  against)  any  particular  deviation  or  varia- 
tion occurring  as  the  result  of  random  sampling.  Exactly 
this  is  what  the  probable  error  does.  It  says,  in  the  par- 
ticular case  here  considered,  that  it  is  an  even  chance, 
that  a  deviation  or  variation  in  the  value  of  the  mean  as 
great  as  or  greater  than  .02  mm.  above  or  below  will  occur 
as  a  result  of  random  sampling.  Or,  put  in  another  way, 
if  we  took  successive  samples  of  1000  each  from  this  crab 
population,  it  is  an  even  bet  that  the  value  of  the  mean 
from  any  sample  would  fall  between  30.144- -02  =  30. 16, 
and  30.14-. 02  =  30.12. 

Now  all  the  knowledge  that  this  probable  error  fur- 
nishes is  this  :  that  if  a  man  were  to  say,  "I'll  bet  a  thousand 
dollars  that  the  mean  thoracic  length  of  the  next  sample 
of  fiddler  crabs  you  measure  will  be  either  over  30.16  mm. 
or  under  30.12  mm.,"  one  would  not  be  justified  in  offering 
odds.  He  could  wager  on  even  terms.  Either  party  in- 
volved in  the  transaction  would  be  as  likely  to  lose  (or  to 
win)  as  the  other. 


DESCRIPTION  AND   SUMMARIZATION  383 

Putting  tlie  ca,se  in  this  way,  it  is  clear  that  this  is  the 
same  kind  of  knowledge  which  comes  from  an  examination 
of  probable  errors  as  that  discussed  in  the  preceding  sec- 
tion. It  is  a  knowledge  of  betting  odds.  It  has  no  nec- 
essary relation  per  se  to  any  physical,  chemical,  or  biolog- 
ical laws.  It  merely  informs  one  how  he  may  safely  gamble 
on  an  event  if  he  is  so  minded  and  can  find  some  one  else 
ready  to  do  the  same  thing. 

Wherein  lies  the  value  of  the  probable  error  concept  for 
science,  then?  Simply  in  that  it  serves  as  a  test  or  check 
on  every  mode  of  research  in  science.  So  far  as  I  can  see, 
the  calculus  of  probability,  in  and  of  itself  alone,  is  not  and 
never  can  be  an  effective  weapon  of  research  for  the  dis- 
covery of  truth  in  phenomenal  science,  be  it  physical  or 
biological.  Yet  it  operates  as  an  ever-present  test  of  the 
trustworthiness  of  the  results  obtained  by  modes  of  re- 
search which  are  in  themselves  adapted  to  making  dis- 
coveries about  phenomena.  The  student  of  probability 
says  something  like  this  to  the  experimentalist:  "Yours 
is  the  way  to  find  out  the  significant  underlying  causes 
of  phenomena.  Let  it  be  practiced  with  all  zeal,  but  let 
it  be  remembered  that  you  operate  in  a  finite  way  in  a 
finite  universe,  and  consequently  all  j^our  results  are  sub- 
ject to  such  fluctuations  and  variations  as  experience  has 
shown  arise  from  random  sampling.  I  regret  that  I  cannot 
directly  and  alone  discover  significant  causes,  but  at  any  rate 
I  can  furnish  you  a  test  whereby  you  may  reasonably  judge 
whether  your  result  is  significantly  influenced  by  these 
fluctuations  of  random  sampling." 

To  sum  the  whole  matter  up :  I  have  tried  to  show  that 
the  statistical  method  in  science  has  been  used  to  do  two 
things. 

The  first  of  these  is  a  unique  function  of  the  method  — 


384  STATISTICAL   METHODS 

to  furnish  a  description  of  a  group  of  objects  or  events  in 
terms  of  the  group's  attributes  rather  than  those  of  the 
individuals  composing  the  group.  Herein  Hes  the  great 
value  of  the  statistical  method.  It  is,  however,  a  descrip- 
tive method  only  and  has  the  limitations  as  a  weapon  of 
research  which  that  fact  implies. 

The  second  purpose  that  the  statistical  method  has 
been  called  upon  to  accomplish  is  the  prediction  of  the  in- 
dividual case  from  a  precise  knowledge  of  the  group  or 
mass.  This  involves  something  really  additional  to  the 
statistical  method  per  se;  namely,  the  mathematical  theory 
of  probability.  We  have  seen  that  this  side  of  the  statis- 
tical method  gives  only  a  somewhat  sterile  kind  of  knowl- 
edge so  far  as  concerns  individuals ;  namely,  a  knowledge 
of  betting  odds.  The  theory  of  probability  grew  up  about 
the  gaming  table,  not  in  the  laboratory.  Its  place  in  the. 
methodology  of  science  is  not  an  independent  one.  By 
it  alone  one  cannot  discover  new  truths  about  phenomena. 
But  it  is  a  highly  important  adjunct  to  other  modes  of  re- 
search. 

Plainly,  however,  one  cannot  regard  statistical  knowl- 
edge in  general  as  a  higher  kind  of  knowledge  than  that 
derived  in  other  ways.  Nor  is  the  statistical  method  to 
become  the  dominant  or  exclusive  method  of  science, 
though  it  will  always  be  useful,  and  in  many  fields  an  es- 
sential method.  It  will  find  its  chief  usefulness,  first  in  its 
sphere  of  furnishing  shorthand  descriptions  of  groups,  and 
second  in  furnishing  a  test  of  the  probable  reliability  of 
conclusions. 

REVIEW 

1.  What  are  the  two  sorts  of  knowledge  about  things  or  events 
which  statistical  methods  help  to  secure? 


DESCRIPTION   AND   SUMMARIZATION  385 

2.  In  what  sense  may  the  arithmetic  mean  be  said  to  be  "  precise 
and  exact  "  —  that  is,  a  reality  in  somewhat  the  same  sense  as  is  the 
mode?     In  what  sense  does  it  become  very  inexact  and  unprecise? 

3.  What  assumptions  are  made  when,  from  the  use  of  statistical 
methods,  an  attempt  is  made  to  predict  "  the  probable  condition  of 
the  individual  from  a  knowledge  of  the  mass  or  group  "? 

4.  How  does  Dr.  Pearl  illustrate  the  problem  of  the  condition  of 
the  individual  from  a  knowledge  of  tht>  group  in  re  likelihood  of  the 
birth  of  a  black  child  in  Baltimore? 

5.  What  is  the  probable  error  and  what  is  its  function  ? 


The  Horizontal  Zero  in  Frequency  Diagrams  ^ 

It  is  a  generally  accepted  rule  of  graphic  presentation 
that  a  zero,  used  in  a  diagram  as  a  point  of  reference,  should 
be  included  in  the  diagram.  This  rule,  while  it  is  observed 
in  most  statistical  work,  is  almost  universally  disregarded 
in  the  drafting  of  frequency  diagrams. 

Diagram  1,  presented  herewith,  is  a  frequency  graph 
of  a  common  type,  based  on  the  weights  of  738  men.^ 
Weights  are  indicated  on  the  base  line,  and  the  per  cent 
of  cases  corresponding  to  any  given  weight  is  proportionate 
to  the  vertical  distance  from  the  base  line  to  the  curve. 
A  zero  line  is  the  most  conspicuous  feature  of  this  dia- 
gram, but  inspection  of  the  figure  shows  that  the  presen- 
tation implies  two  zeros,  and  that  only  one  of  these  is  shown. 
The  vertical  scale,  representing  percentages,  begins  at  the 
zero  base  line,  but  the  horizontal  scale,  representing  weights, 
begins  at  90  pounds.  It  is  the  purpose  of  this  paper  to 
state   reasons  for  including   the   horizontal   zero,  to  direct 

•  Adapted  with  permission  from  Clark,  Earle,  "The  Horizontal  Zero  in 
Frequency  Diagrams,"  Quarterly  Publications  of  the  Atntrican  Statistical 
Association,  June,  1917,  pp.  662-669. 

^  The  data  are  for  738  men  born  in  Wales,  as  shown  in  Yule's  "Introduc- 
tion to  the  Theory  of  Statistics,"  p.  95.  For  convenience  in  presentation, 
the  extremes  of  the  distribution  have  been  arbitrarily  shortened. 

2c 


386 


STATISTICAL   METHODS 


attention  to  a  type  of  frequency  diagrams  to  which  these 
reasons  do  not  apply,  and  to  illustrate  methods  of  drafting, 

A  frequency  diagram  is  plotted  for  the  purpose  of  show- 
ing the  significant  facts  about  a  series  of  variables.  The 
graphic  form  is  used  rather  than  a  frequency  table  or  text 
statement  because  most  people,  even  most  statisticians, 
find  it  easier  to  perceive  and  appreciate  these  significant 


Diagram  1.  —  Weights  of  738  Men,  Shown  without  Horizon- 
tal Zero 


Per 

cent 

40. 


30 


20 


10 


O 

Pounds   90 


ammta  ^^i^bbbi  im^aa^^  hi^m 


130         Kf 


190 


230 


270 


facts  by  looking  at  a  diagram  than  by  studying  a  column 
of  figures.  The  essential  facts  about  a  variable  series  are : 
(1)  the  mean,  median,  or  other  measure  of  central  tend- 
ency, and  (2)  the  distribution  of  the  values  about  this 
central  tendency.  These  facts  are  interdependent.  It 
is  a  simple  matter  to  compute  medians  or  means,  but  these 
measures  do  not  reveal  the  whole  truth  about  a  distri- 
bution ;  they  may  be  seriously  misleading  unless  shown 
in  relation  to  the  distribution  of  the  individual  values. 

On  the  other  hand,  the  distribution  is  not  in  itself  sig- 
nificant unless  related   to   the    central   tendency.      Stated 


DESCRIPTION  AND  SUMMARIZATION  387 

in  pounds  and  ounces,  the  average  deviation  of  the  weights 
of  a  group  of  1000  elephants  would  doubtless  be  far  greater 
than  the  average  deviation  of  the  weights  of  1000  canary 
birds,  but  this  would  not  necessarily  mean  that  the  weights 
of  elephants  are  relatively  more  variable  than  the  weights 
of  canary  birds.  In  order  to  determine  the  true  variability 
of  a  series  it  is  necessary  to  relate  the  measure  of  disper- 
sion to  the  measure  of  central  tendency.  This  may  be  done 
by  computing  a  coefficient  of  dispersion  —  a  ratio  which 
expresses  the  dispersion  as  a  proportion  of  the  measure 
of  central  tendency. 

It  follows  that,  if  a  frequency  diagram  is  to  serve  the 
purpose  for  which  it  is  intended,  it  must  show,  with  all 
possible  clearness  and  effectiveness,  the  distribution  of  the 
individual  values,  the  central  tendency,  and  the  relation 
of  the  distribution  to  the  central  tendency.  Diagram  1 
shows  the  distribution  of  the  measures.  Does  it  also  show, 
with  the  emphasis  required,  the  two  other  essential  facts? 

On  Diagram  1  the  median  is  indicated  in  the  usual  way 
—  by  a  vertical  line  dividing  into  two  equal  parts  the  sur- 
face of  the  figure  inclosed  by  the  curve  and  the  base  line. 
This  line  is  sometimes  referred  to  as  the  median  line,  but 
the  designation  does  violence  to  the  principles  of  graphic 
presentation.  In  diagrams,  lines  or  areas  are,  or  should 
be,  proportionate  to  the  quantities  they  represent.  The 
length  of  the  so-called  "median  line"  is  not  proportionate 
to  the  median  weight  of  men ;  it  is  proportionate  rather, 
as  the  class  interval  for  the  distribution  is  20  pounds,  to 
the  approximate  number  of  men  whose  weights  fall  within 
limits  fixed,  respectively,  at  10  pounds  below  and  at  10 
pounds  above  the  median  weight.  The  line  represents, 
in  other  words,  not  the  median  value  for  the  series,  but  a 
number  of  cases.     There  is  nowhere  on  the  diagram  a  line 


3SS 


STATISTICAL   METHODS 


representing  by  its  length,  or  a  surface  representing  by  its 
area,  the  median  weights  of  the  men. 

The  metiian  can  be  determined,  it  is  true,  by  referring 
to  the  scale  at  the  foot  of  the  figure.  As  the  point  of  inter- 
section of  the  so-called  "■median  line"  with  the  base  line 
falls  at  156  pounds,  as  indicated  by  the  horizontal  scale, 
it  follows  that  this  value  is  the  median,  but  the  result  is  not 


Diagram  2.  —  Weights  of  73S  Mex,  Shown   avith   Horizontal 

Zkko 


Per 

cent 


40 

30 

20 
10 


Pounds  0 


-4V 

t\ 

I               \ 

1     J    ^-~,. 

50 


90 


130      M        190        230         270 


obtained  by  the  graphic  method.  The  figures  on  the 
scale  are  not  graphic  representations  any  more  than  are 
the  figures  of  a  table  or  a  text  statement. 

The  median  can,  however,  be  shown  by  the  graphic 
method  by  so  extending  the  base  line  that  the  horizontal 
scale  will  include  the  zero.  This  method  has  been  followed 
in  preparing  Diagram  2.  In  Diagram  2  the  horizontal 
distance  from  the  vertical  line  at  the  left  of  the  figure  to 
the  so-called  "median  line,"  measured  on  the  base  Une  or 
along  any  abscissa,  represents  the  median  weight  of  the 
men. 


DESCRIPTION  AND  SUMMARIZATION  389 

If  the  inclusion  of  the  horizontal  zero  Ls  required  for  a 
complete  graphical  representation  of  the  median,  it  is  even 
more  essential  as  a  means  of  showing  the  relationship  of 
the  dispersion  to  the  median.  As  Diagram  1  contains  no 
graphical  representation  of  the  central  tendency,  it  fol- 
lows that  it  affords  no  graphical  representation  of  the  re- 
lation between  the  central  tendency  and  the  dispersion. 
The  dispersion  of  the  series  is  indicated  by  the  form  of  the 
curve  and  also  by  a  line  beneath  the  base  line,  propor- 
tionate in  length  to  the  average  deviation  (14.2  pounds), 
drawn  to  scale  and  extending  to  the  left  of  the  median. 
By  including  this  line,  the  dispersion  is  reduced  to  a  single 
graphical  expression,  but  the  diagram  contains  no  graphi- 
cal representation  of  the  median  with  which  either  the 
line  or  the  curve  can  be  compared. 

An  effective  graphical  representation  of  the  relation- 
ship between  the  central  tendency  and  the  distribution  is 
found  in  Diagram  2,  in  which  the  median,  represented  by 
the  distance  between  the  horizontal  zero  and  the  vertical 
"median  line,"  can  be  compared  both  with  the  surface 
of  frequency,  as  indicated  by  the  curve,  and  with  the  hne 
representing  the  average  deviation.  The  ratio  of  the  length 
of  this  line  to  the  distance  from  the  horizontal  zero  to  the 
median  line  is  equivalent  to  the  coefficient  of  dispersion. 

The  difficulties  arising  from  the  omission  of  the  hori- 
zontal zero  are  further  illustrated  in  Diagram  3,  in  which 
the  weights  of  the  738  men  are  compared  with  the  weights 
of  279  thirteen  and  fourteen-year-old  school  boys.^ 

In  Diagram  3  the  scales  for  pounds  are  identical  in  both 
figures.     The    appearance    of    the    diagram    suggests    that 

•  The  data,  which  are  for  boys  attending  the  Worcester,  Mass.,  public 
schools,  are  from  a  report  by  Franz  Boas  and  Clark  Wissler,  published  in 
the  report  of  the  U.  S.  Commissioner  of  Education  for  1904. 


390 


STATISTICAL   METHODS 


the  two  distributions  are  very  much  alike;    as  the  figure 
for  men  has  a  greater  spread  at  the  base  hne  than  that 


Diagram  3.  — Weights  of  738  Men  and  279  Boys,  Shown  %\aTH- 
OTJT  Horizontal  Zeros 


Figvre  A  -  t?en 


Pounds   90 


Figvare  B 

-  Boy. 

3 

Per 
cent 

40 

20 

10 

0 
Pounds  A 

V 

/ 

\ 

\ 

V 

/ 

\ 

/ 

\ 

\ 

' 

\ 

"-^ 

0 

1 

n 

i 

li 

>o 

li 

')0 

for  boys  it  would  seem  that  the  former  represents,  if  any- 
thing, the  wider  dispersion.  This  impression  is  not  borne 
out  by  the  data.     The  actual  dispersion   (average  devia- 


DESCRIPTION  AND   SUMMARIZATION  391 

tion)  is,  roughly,  the  same  for  the  two  series :  14.2  pounds 
for  the  men  and  14.3  pounds  for  the  boys.  But  as  the 
median  for  the  men  is  156.3  pounds,  and  that  for  the  boys 
90.8  pounds,  computation  shows  that  the  significant  meas- 
ure of  relative  variability,  the  coefficient  of  dispersion,  is 
.157  for  the  boys  and  only  .091  for  the  men.  In  other 
words,  the  dispersion  of  the  weights  of  the  boys  is  15.7  per 
cent  of  the  median  weight  of  boys,  while  for  the  men  the 
dispersion  of  the  weights  is  but  9.1  per  cent  of  the  median 
weight  of  men.  The  apparent  similarity  of  the  two  dis- 
tributions represented  in  Diagram  3  is,  therefore,  acci- 
dental and  the  diagram  is  misleading. 

It  may  be  said  that  any  one  using  Diagram  3  could  de- 
termine the  relative  dispersions  by  a  study  of  the  figures 
of  the  scales;  that  the  scales  show  the  medians,  and  that 
it  is  not  impossible  to  relate  these  medians  to  the  disper- 
sions. This  is  true,  but,  as  the  same  facts  can  be  deter- 
mined from  a  frequency  table,  the  argument  offered  is 
merely  an  argument  for  not  using  graphical  representa- 
tions for  comparing  two  or  more  series  of  variables. 

Diagram  4  shows  in  graphic  terms  the  true  relationship 
between  the  dispersions.  The  base  lines  of  Figures  A  and  B 
of  this  diagram  have  been  carried  out  to  zero,  and  the  scales 
have  been  so  adjusted  that  the  distance  from  zero  to  the 
median  is  the  same  in  both  figures.  It  is  now  possible  to 
view  the  dispersions  in  their  relationship  to  the  central 
tendencies.  The  lines  representing  the  average  deviations, 
as  well  as  the  contours  of  the  curves,  show  very  clearly 
that  the  weights  of  boys  are  much  more  widely  dispersed 
than  the  weights  of  men. 

The  fact  that  in  Diagram  4  the  surface  inclosed  by  the 
curve  and  base  line  of  Figure  B  is  much  greater  than  that 
inclosed  by  the  curve  and  base  line  of  Figure  A  might  lead 


392 


STATISTICAL   METHODS 


an  incautious  observer  to  assume  that  the  dissimilarity  in 
the  appearance  of  the  figures  is  due  to  a  difference  in  the 


Diagram  4.  —  Weights  of  738  Men  and  279  Boys,  Shown  ■v\^TH 

Horizontal  Zeros 


Flgrure  A  -  V.en 


Per 
cent 

40 


30 
20 
10 


■■ 

1 

\ 

/ 

\ 

/ 

\ 

,/ 

/ 

\ 

/ 

' 

\ 

Pounds  6 


130     R:       190        230 


270 


Pounds  0 


5o"^i 


number  of  observations  —  that  the  number  of    boys  ex- 
ceeds the  number  of  men.     Such  an  inference  would  be 


DESCRIPTION  AND   SUMMARIZATION  393 

unwarranted.  As  numbers  have  been  reduced  to  per- 
centages, 100  per  cent  is  the  total  for  each  group.  The 
values  are  plotted  upon  the  ordinates ;  hence,  the  spaces 
between  the  ordinates,  and  the  areas  inclosed  by  the  curves 
and  the  base  lines,  are  without  significance.  It  is  believed 
that  the  diagram  affords  a  correct  interpretation  of  the 
data ;  that  it  gives  an  impression  of  two  groups  of  which 
one  is  somewhat  closely  clustered  about  its  central  tend- 
ency, while  the  other  is  much  more  widely  dispersed. 

It  should  be  noted  that  there  is  an  important  group  of 
frequency  diagrams  to  which  the  arguments  in  favor  of 
including  the  horizontal  zero,  which  have  been  stated  in 
the  preceding  pages,  do  not  apply.  These  are  diagrams 
of  distributions  in  which  the  zero  cannot  be  exactly  located. 
In  the  so-called  normal  frequency  distribution  the  base 
line  and  the  ends  of  the  curve  are  in  asymptote  —  the  ends 
and  the  base  line  are  tangent  at  infinity.  It  follows  that, 
in  plotting  probabilities,  or  results  in  the  psychological 
field  which  are  based  not  upon  concrete  measurements  but 
upon  ranldngs,  the  horizontal  zero  cannot  be  shown. 

But  it  is  also  impossible  to  show  a  zero  based  upon  data 
of  this  kind  in  any  type  of  diagram,  and  this  is  true  whether 
the  zero  is  vertical  or  horizontal.  If  the  horizontal  zero 
cannot  be  shown  in  a  frequency  diagram  representing  the 
distribution  of  schoolboys  with  reference  to  a  given  mental 
trait,  as  determined  by  the  rankings  of  competent  judges, 
neither  can  a  zero  be  shown  in  a  diagram  in  which  the 
ability  of  any  one  of  these  boys  at  successive  tests  is  indi- 
cated by  a  historical  curve.  It  is  possible  to  present  a 
horizontal  zero  in  a  frequency  diagram  for  any  data  for 
which  a  vertical  zero  for  an  ogive  curve  can  be  shown. 

A  practical  objection  to  the  inclusion  of  the  horizontal 
zero  is  the  fact  that  additional  space  is  required.     But  this 


394  STATISTICAL   METHODS 

objection  is  no  more  applicable  to  the  horizontal  zero  in 
frequency  diagrams  than  to  the  vertical  zero  in  line  diagrams. 
The  inclusion  of  the  vertical  zero  in  diagrams  of  the  latter 
tjT)e  is  the  established  practice.  And  an  inspection  of  the 
diagrams  presented  with  this  paper  makes  it  clear  that  the 
inclusion  of  the  horizontal  zero  presents  no  serious  diffi- 
culties. A  case  will  occasionally  be  encountered  in  which 
the  dispersion  constitutes  so  small  a  proportion  of  the 
central  tendency  that  the  zero,  whether  horizontal  or  verti- 
cal, must  be  omitted,  but  such  cases  are  most  exceptional. 
The  arguments  and  the  illustrations  presented  in  the 
preceding  pages  seem  to  support  the  following  conclu- 
sions :  In  frequenc}^  diagrams,  where  the  position  of  the 
horizontal  zero  is  exactly  ascertainable,  and  where  the 
dispersion  is  not  too  small  in  proportion  to  the  measure  of 
central  tendency,  the  horizontal  zero  should  be  included 
in  the  diagram.  This  means  that  the  horizontal  zero  should 
be  included  in  a  frequency  diagram  in  all  cases  in  which  a 
zero  for  similar  data  would  be  included  in  any  type  of 
diagram.  Without  the  horizontal  zero  the  frequency  dia- 
gram does  not  afford  a  complete  graphical  representation 
of  the  central  tendency  nor  of  the  relationship  of  the  cen- 
tral tendency  to  the  distribution. 

REVIEW   PROBLEMS 

Dispersion  and  Skewness 

1.    Dispersion. 

(1)  Using  the  data  in  Chapter  VIII,  pp.  348,  for  expenditures 
for  breakfast,  dinner,  and  supper,  express  both  absolutely  and 
relatively  the  dispersion  in  different  expenditure  series  by  the 
cumulative  or  moving-range  method.  Put  your  data  in  the 
form  of  a  single  table.  (See  Text,  page  383.)  Reduce  the  measures 
of  dispersion  to  coefficients.     Relatively  how  do  the  series  compare? 


DESCRIPTION  AND   SUMMARIZATION  395 

(2)  Average  Deviation. 

Using  the  data  as  in  (1)  above,  compute  the  average  deviation  by 
the  short-cut  method.  Arrange  the  data  in  the  form  of  tables. 
(See  Text,  pages  396-398.)  Test  your  result  by  computing  the 
average  deviation  from  the  true  average.  Reduce  your  measures 
of  dispersion,  based  upon  the  average  deviations,  to  coefficients. 
Relatively  how  do  the  distributions  stand? 

(3)  Using  the  data  as  in  (1)  above,  compute  the  standard  devia- 
tions. Arrange  your  data  in  the  form  of  tables.  (See  Text,  pages 
404-405.)  Compare  the  standard  and  average  deviations.  Do 
the  contentions  in  the  Text,  pages  402-403  and  406,  seem  to  be  borne 
out?  Reduce  the  measures  of  dispersion  based  upon  the  standard 
deviation  to  coefficients.  Relatively  how  do  the  distributions 
stand  ? 

(4)  QuartUe  Deviation. 

Using  the  data  as  in  (1)  above,  compute  both  the  quartile  measures 
and  coefficients  of  dispersion.  Compare  the  quartile  measures  and 
coefficients  of  dispersion  with  those  based  on  the  standard  and 
average  deviations.  Arrange  your  comparison  in  the  form  of 
tables.  Do  the  contentions  respecting  the  quartiles,  found  on 
pages  408-409  of  the  Text,  seem  to  be  borne  out? 

2.  Skewness. 

Using  the  data  in  (1)  above,  compute  the  quartile  measures  and 
coefficients  of  skewness,  and  the  coefficients  based  upon  the  standard 
deviations.  Is  the  rule  on  page  417  of  the  Text  respecting  the  posi- 
tions of  averages  borne  out  in  these  cases?  What  variations  are 
there  from  the  ideal? 

3.  Dispersion  and  Skewness. 

Formulate  a  general  statement  summarizing  the  functions  and 
merits  in  statistical  analysis  of  measures  and  coefficients  of  dis- 
persion and  skewness.  Illustrate  the  points  made  by  referring  to 
your  results  in  the  above  problems.  Revise  your  answer  to  Prob- 
lem III  on  Tabulation  in  the  light  of  your  measures  and  coefficients 
of  dispersion  and  skewness. 


CHAPTER  X 
COMPARISON  —  CORRELATION 

The  Limits  of  Statistics  ^ 

...  It  is,  however,  a  fact  too  well  recognized  to  re- 
quire specific  illustration  that  statistics,  on  its  objective 
and  mathematical  side,  presents  at  best  but  a  rearrange- 
ment of  the  data.  The  data,  thus  marshaled,  cannot  in 
themselves  provide  a  solution  to  any  social  problem :  they 
merely  constitute  a  problem.  In  fact,  the  most  signal 
merit  of  statistics  consists  perhaps  in  the  very  aptitude 
of  that  method  to  bring  to  the  surface  problems  which  other- 
wise might  never  be  recognized.  But  the  solution  of  such 
problems  can  only  be  reached  within  the  level  to  which 
the  data  themselves  belong,  and  thus  falls  to  the  lot  of  the 
sciences  representing  the  conceptualizations  of  the  par- 
ticular set  of  data,  whether  this  be  biology,  or  psychology, 
or  sociology.  There  is  thus  good  common  sense  in  the 
popular  saying  that  statistics  can  be  made  to  prove  any- 
thing, implying  that  it  is  the  interpretation  of  the  statisti- 
cal material  which  counts,  and  that,  if  the  interpretation 
is  arbitrary,  the  mathematical  garb  of  the  data  is  no  guar- 
antee of  truth. 

»  Adapted  with  permission  from  Goldenweiser,  A.  A.,  "History,  Psychol- 
ogy and  Culture, "  in  Journal  of  Philosophy,  Psychology  and  Scientific  Methods, 
October  10,  1918,  pp.  567-568. 

396 


COMPARISON  —  CORRELATION  397 

Difficulties  in  International  Statistical 
Comparisons  ^ 

.  .  .  The  various  kinds  of  difficulties  may  be  broadly 
classified  as  those  due  to : 

(1)  Inadequate  definition ; 

(2)  Non-identity  of  definition  ; 

(3)  Absence  of  information  showing  in  what  particulars 
unlike  definitions  really  differ  ; 

(4)  Differences  in  the  periods  of  time  for  which  statis- 
tical returns  are  collected.  This  is  really  a  special  case 
of  differences  in  definition,  but  it  is  important  enough 
to  deserve  special  mention ; 

(5)  Differences  in  the  classification  of  statistics  —  an- 
other special  and  important  case  of  differences  in  defini- 
tions ; 

(6)  Varying  degrees  of  incompleteness  of  statistics  cover- 
ing the  same  subject-matter.  This  case  has  an  extensive 
aspect,  where  the  statistics,  though  complete  so  far  as  they 
go,  do  not  cover  the  whole  ground.  .  .  .  There  is  also  an 
intensive  aspect,  where  the  statistics,  though  nominally 
covering  the  whole  ground,  are  incomplete  through  faulty 
collection.  .  .  . 

(7)  Lack  of  particular  kinds  of  information  necessary 
to  a  complete  comparison ;  and 

(8)  Absolute  incomparability,  arising  from  what  may 
be  called  organic  differences  in  the  subject-matter,  as  dis- 
tinct from  the  deficiencies  in  the  statistics  relating  to  that 
subject-matter. 

1  Adapted  with  permission  from  Weber,  Augustus  D.,  "Notes  on  Some 
Difficulties  Met  with  in  International  Statistical  Comparisons,"  in  Journal 
of  the  Royal  Statistical  Society,  Vol.  73,  1910,  pp.  10-11. 


398  STATISTICAL   METHODS 

Difficulties  in  International  Comparison  of 

Wages  ^ 

A  class  of  statistics  .  .  .  presenting  some  of  the  greatest 
difficulties  in  comparisons,  and  yet  one  with  respect  to  which 
comparisons  are  frequently  made,  is  the  class  of  wages 
statistics.  Here  it  is  a  case  of  definition  in  the  widest  sense. 
What  are  wages  ?  From  current  popular  hterature  one  might 
suppose  they  were  a  rate  of  money  per  hour,  or  per  day,  or 
per  week,  with  no  suggestion  that  such  a  rate  may  be  a  "stand- 
ard rate,"  or  the  arithmetical  average  of  a  number  of  rates 
actually  paid,  or  the  "modal"  rate  actually  paid,  or  the  rate 
in  a  particular  locality,  or  any  one  of  a  number  of  such  things. 
It  may  happen  that  the  only  rates  published  are,  for  a  certain 
trade  in  one  country,  actual  earnings,  and,  in  another  country, 
the  standard  rates.  .  .  ,  How  are  these  to  be  compared 
without  knowing  the  relation  of  actual  earnings  to  standard 
rates  in  one  country  or  the  other?  But  the  money  rate  per 
unit  of  time  or  work,  whether  standard  or  any  other  rate, 
is  after  all  the  least  -important  thing  about  wages.  If  the 
French  artisan  earning  Sd.  per  hour  is  as  strong  and  healthy, 
as  well  fed,  clothed,  and  housed,  if,  in  a  word,  he  has  his  eco- 
nomic wants  as  satisfactorily  met  as  the  English  artisan  getting 
lOd.  an  hour,  can  it  be  really  maintained  that  economically 
the  Frenchman  is  more  badly  paid  or  is  worse  oflf  than  the 
Englishman?  Wages,  in  fact,  from  the  international,  if 
from  no  other,  point  of  view,  are  not  money  rates,  but  eco- 
nomic goods,  tangible  and  otherwise,  which  the  worker  can 
and  does  get  in  return  for  his  labor,  and  wages  in  different 
countries  can  only  be  properly  compared  when  expressed 
in  terms  of  economic  goods,  and  allowance  made  for  the 

'  Adapted  with  permission  from  Weber,  Augtistus  D.,  "Notes  on  Some 
Difficulties  Met  with  in  International  Statistical  Comparisons,"  in  Journal 
of  the  Royal  Statistical  Society,  Vol.  73,  1910,  pp.  17-19. 


COMPARISON  —  CORRELATION  399 

different  marginal  values  which  the  same  goods  may  possess 
to  different  individuals  or  at  least  to  different  commu- 
nities. It  is,  of  course,  well  known  that  wages  statistics  are 
not  and,  in  the  present  state  of  our  knowledge,  cannot  be 
expressed  in  this  way.  An  approximation  to  it  is,  how- 
ever, afforded  by  the  method  of  correcting  money  wages  by, 
or  rather  interpreting  them  in  the  light  of,  what  is  called  the 
cost  of  living.  Statistics  of  the  cost  of  living  of  particular 
classes  in  certain  countries  are  growing  in  volume,  though 
they  are  still  too  inadequate  to  permit  of  anything  like  an 
exact  interpretation  and  comparison  of  money  wages  in 
terms  of  "real"  wages.  The  most  important  recent  con- 
tribution to. these  statistics  are,  so  far  as  I  am  aware,  the 
reports  by  our  Board  of  Trade  on  cost  of  living  in  British, 
French,  and  German  towns,  while  the  United  States  Labor 
Department  at  Washington  has  issued  valuable  reports  on 
cost  of  living  in  the  States.  From  the  Board  of  Trade  re- 
ports referred  to  we  find,  e.g.  that  while  money  wages  in 
England,  France,  and  Germany  may  be  in  the  proportion  of 
100 :  75 :  83,  such  wages  when  interpreted  in  the  Hght  of 
the  cost  of  fuel,  rent,  and  food  in  the  respective  countries, 
may  be  found  to  be  in  the  ratio  of  100:67:71.  These 
figures  may  be  but  very  rough  approximations  to  the  true 
level  of  "real"  wages  in  the  countries  compared,  but  if  the 
data  on  which  they  are  based  are  fairly  extensive  or  form  a 
good  sample  from  which  to  estimate  the  cost  of  living,  they 
are  much  better  than  the  level  of  money  wages,  and  it  is  to 
be  desired  that  authentic  and  detailed  information  on  cost 
of  living  in  all  civilized  countries  may  be  collected  and 
published. 

But  even  with  such  additional  information,  the  correct 
comparison  of  international  wages  statistics  is  impossible 
without  a  knowledge  of  the  amount  of  unemployment  ex- 


400  STATISTICAL   METHODS 

perienced  in  different  occupations  in  different  countries. 
This  knowledge  is  at  present  not  obtained.  The  Trade 
Union  unemployment  figures  pubhshed  by  the  Board  of 
Trade  may  reasonably  be  challenged,  as  they  often  are,  as 
not  affording  an  entirely  complete  statement  of  the  amount 
of  unemployment  in  this  comitry.  But  such  as  they  are, 
there  are,  I  believe,  no  similarly  extensive  statistics  in  any 
other  country  comparable  with  them.  •  The  importance  of 
unemployment  as  a  social  fact  is  undeniable,  and  every  effort 
should  be  made  to  ascertain  its  real  extent.  This  may  be 
largely,  if  not  wholly,  accomplished  by  means  of  Trade 
Unions,  Labor  Exchanges,  and  Unemployment  and  other 
social  insurance  schemes.  Until  this  information  is  forth- 
coming, it  appears  clear  that  wages  statistics  will  not  be 
capable  of  complete  interpretation  or  of  precise  comparison. 

The  Coefficient  of  Correlation^ 

In  many  studies  it  is  necessary  or  at  least  desirable  to  test 
the  existence  of  concomitant  variation  between  two  series  of 
variable  quantities.  A  comparison  of  the  plotted  variables 
furnishes  a  rough,  but  for  some  purposes  adequate,  means  of 
examining  the  relationship.  Figure  1  is  an  example  of  this 
sort  of  comparison.  However,  the  use  of  curves  is  not  to  be 
recommended  for  careful  work  because  of  the  difficulty  in 
selecting  the  proper  scales  and  the  dangers  resulting  from  per- 
sonal bias.  The  usual  tabular  method  is  slightly  more  refined 
but  tables  involve  too  many  figures  to  give  an  adequate  idea 
of  the  conditions  and  give  no  concise  measure  of  the  degree 
of  relationship. 

'  Adapted  with  permission  from  Reed,  William  Gardner,  "The  Coefficient 
of  Correlation,"  Quarterly  Publications  of  the  American  Statistical  Association, 
June,  1917,  pp.  670-684. 


COMPARISON  —  CORRELATION 


401 


The  English  biometricians  have  perfected  a  method  of 
stating  the  degree  of  relationship,  which  was  invented  by 
Bravais  about  1845.  "Correlation  may  be  briefly  defined  as 
the  tendency  towards  concomitant  variation  and  the  so-called 
correlation  coefficient  is  simply  a  measure  of  such  tendency, 


RAIN- 
FALL 

RELATION  BETWEEN  THE  JULY  RAINFALL  AND  THE  YIELD  OF  CORN,  1888-1915 

YIELD 

s  1  1  i  1  i  1  i  1  i  i  i  1  1  i  i  1  i  i  1  1  i  i  i  i  1  1 

♦3.0 
V2.5 

♦  2.0 
+  1.5 
+  1.0 

♦  0.5 

0 
-0.5 
-1.0 
-1.5 
-2.0 
-2  5 
-3.0 

" 

+12 
+  10 

♦  8 

♦  6 
+  4 

♦  2 
0 

-  2 

-  4 

-  6 

-  8 
-10 
-12 

1 

1 

/ 

I 

/ 

I 

-' 

\     \ 

~j 

// 

\\ 

t 

\ 

^ 

\ 

i 

/ 

\ 

/ 

1 
\ 

J 

r 

1 

\ 
\ 

NO 

?v 

AL 

.' 

\\ 

/ 

/ 

\ 

/' 

1 

\ 

h 

\  \ 

/ 

• 

\^^ 

f . 

1 

\ 

' 

\ 

1 

^ 

K 

1 

\ 

r^ 

/ 

^ 

V 

\ 

/ 

\\ 

.  / 

\ 

ll 

^ 

/ 

\ 

1 

\\ 

/ 

\ 

/ 

/ 

' 

~\ 

y 

1 

1 

1 

' 

\ 

1 

1 
1 

1 

\ 

1 

1 

\ 

The  solid  line  ( (indicates  the  departure  of  the  average  rainfall   from  the  normal  for 

the  month  of  July. over  the  fonowin|.named  Siates.for  the  25  years  indicated:  Oh. o. Indiana 
Illinois.  \o^i.  Nebraska.  KSnsas. Missouri. and  Kentucky. 

The  broUen  linel Ishows  the  departure  of  the  average  yield  of  Corn  from  the  norma.jn 

bushels  per  acre  forthe  same  area. and  period. 

U.S.WCaTherB«r.Noti©rtOl  Weo+ndfortd  c-opBuiiSeciSie no  l*. June  20,  I9l6 

Figure  1. 


more  or  less  adequate  according  to  the  circumstances  of  the  case.''  ^ 
The  early  statements  of  the  use  of  the  coefficient  of  correla- 
tion indicate  clearly  that  the  attempt  to  obtain  such  a  coef- 
ficient from  miscellaneous  material  is  an  abuse  of  this  method 
of  measuring  relationship.^     The  material  in  hand  should  be 

1  Brown,  W. :  The  Essentials  of  Mental  Measurement,  Cambridge,  Uni- 
versity Press,  1911,  p.  42.      (Italics  are  the  present  writer's.) 

^Yule,  G.  U. :  Introduction  to  the  Theory  of  Statistics,  ed.  2,  London, 
Griffin  &  Co.,  1912,  pp.  169,  177.     ,^ 
2d 


402  STATISTICAL   METHODS 

investigated  carefully  before  any  attempt  is  made  to  deter- 
mine the  relationship  by  the  use  of  the  coefficient  of  correla- 
tion. This  investigation  may  take  the  form  of  a  correlation 
table  or  of  a  "dot  chart"  after  Galton's  graphic  method  of 
correlation.^ 

Method  of  Procedure 

If  the  coefficient  of  correlation  is  to  have  any  definite 
meaning,  the  procedure  must  be  somewhat  as  follows : 

1.  The  material  (e.  g.  Table  I)  should  be  arranged  in  groups 
in  the  form  of  a  correlation  table  (Table  II),  or,  better,  plotted 
as  a  dot  chart  (Figure  2).  The  table  or  chart  should  then  be 
carefully  examined  to  see  whether  the  points  may  be  general- 
ized to  a  straight  line,  that  is,  whether  there  is  a  tendency 
for  a  high  value  of  one  variable  to  be  associated  with  high 
values  of  the  other  variable  and  proportionately  higher  or 
lower  values  of  the  one  to  be  associated  with  similar  values  of 
the  other.  This  shows  positive  linear  correlation.  When 
lower  values  of  the  one  are  associated  with  higher  values  of 
the  other,  the  correlation  is  said  to  be  negative.  For  example, 
the  dots  in  Figure  2  may  be  generalized  to  the  fine  AB  as  well 
as  to  any  curve. 

iW'x  =  4.0  inches  ilf'„  =  35bu. 

M.  =  4.0-h||  =  4.1  M,  =  35- 1^  =  34.6 

Sx=-f-3.9  2^= -26 

2x2  =  112.67  2?/2  =  1258 


^   n       \n  J  ^   n       \n  J 

^  See  Davenport,  C.  B.,  "Statistical  Methods,"  ed.  3,  New  York,  Wiley, 
1914,  pp.  42-47. 


COMPARISON  —  CORRELATION 


403 


Table  I 

Correlation  op  July  Rainfall  and  the  Yield  of  Corn  in  Ohio 

(Smith,' J.  W. :  The  Effect  of  Weather  upon  the  Yield  of  Corn  in  Ohio.     Washington, 
Mo.,  Weather  Rev.,  Vol.  42,  1914,  p.  80.) 


July  Rainfall 

Yield  of  Corn 

Year 

Amount 

X 

a:2 

Bushels 
per  Acre 

y 

yt 

XV 

1854  .  .  . 

2.6 

-  1.4 

1.96 

26.0 

-  9 

81 

J 

-  12.6 

1855  . 

5.8 

+  1.8 

3.24 

39.7 

+  5 

25 

- 

-  9.0 

1856  . 

2.6 

-  1.4 

1.90 

27.7 

-  7 

49 

_ 

-  9.8 

1857  . 

4.9 

+   .9 

.81 

36.6 

+  2 

4 

. 

-  1.8 

1858  . 

4.7 

+   .7 

.49 

27.7 

-  7 

49 

-  4.9 

1859  . 

1.6 

-  2.4 

5.76 

29.5 

-  5 

25 

- 

1-  12.0 

1860  . 

5.8 

+  1.8 

3.24 

38.2 

+  3 

9 

. 

-  5.4 

1861  . 

3.3 

-   .7 

.49, 

33.5 

-   1 

1 

_ 

-   .7 

1862  . 

3.6 

-   .4 

.16 

30.0 

-  5 

25 

_ 

-  2.0 

1863  . 

2.6 

-  1.4 

1.96 

27.0 

-  8 

64 

_ 

-  11.2 

1864  . 

2.1 

-  1.9 

3.01 

27.0 

-  8 

64 

_ 

-  15.2 

1865  . 

5.7 
5.1 

4-  1.7 
+  1.1 

2.89 
1.21 

35.0 
36.5 

0 

+  2 

"  "i 

„ 

1866  . 

r'2;2 

1867  . 

3.2 

-   .8 

.64 

29.8 

-   5 

25 

+  4.0 

1868  . 

2.7 

-  1.3 

1.69 

34.4 

-   1 

1 

4-  1.3 

1869  . 

4.8 

+   .8 

.64 

28.4 

-  7 

49 

-  5.6 

1870  . 

4.7 

+   .7 

.49 

37.5 

+  3 

9 

+  2.1 

1871  . 

3.7 

-   .3 

.09 

36.7 

X  i 

4 

-   .6 

1872  . 

6.7 

+  2.7 

7.29 

40.9 

36 

+  16.2 

1873  . 

6.2 
3.8 

+  2.2 
-   .2 

4.84 
.04 

35.1 
39.2 

0 
+  4 

"ie 

1 

1874  . 

-"".% 

1875  . 

6.9 

+  2.9 

8.41 

34.2 

-   1 

1 

-  2.9 

1876  . 

6.4 

+  2.4 

5.76 

36.9 

+  2 

4 

+  4.8 
4-   .6 

1877  . 

3.7 

-   .3 

.09 

32.5 

-  2 

4 

1878  . 

5.4 

+  1.4 

1.96 

37.8 

+  3 

9 

+  4.2 

1879  . 

4.2 

+   .2 

.04 

34.3 

-   1 

1 

-   .2 

1880  . 

4.2 

+   .2 

.04 

38.9 

+  4 

16 

+   .8 

1881  . 

3.6 

-   .4 

.16 

31.0 

-  4 

16 

--  1.6 

1882  . 

3.2 

-   .8 

.04 

34.0 

-   1 

1 

4-   .8 

1883  . 

4.2 

+   .2 

.04 

24.2 

-  11 

121 

-  2.2 

1884  . 

3.8 

-   .2 

.04 

33.3 

-  2 

4 

+   .4 

1885  . 

3.2 

-   .8 

.64 

36.8 

+  2 

4 

-   1.6 

1886  . 

2.9 

-  l.I 

1.21 

33.5 

-  1 

1 

+  1.1 

1887  . 

1888  . 

2.2 

-  1.8 

3.24 

30.5 

-  4 

16 

--  7.2 

4.4 

+   .4 

.10 

38.9 

+  4 

16 

4-  1.6 

1889  . 

4.2 

+   .2 

.04 

32.3 

-  3 

9 

T   -6 

1890  . 

2.0 

-  2.0 

4.00 

24.6 

-  10 

100 

+  20.0 

1891  . 

3.8 

-   .2 

.04 

35.6 

+  1 

1 

-   .2 

1892  . 

3.8 

-   .2 

.04 

33.3 

-  2 

4 

- 

.4 

1893  . 

2.5 

-  1.5 

2.25 

29.1 

-  6 

36J 

- 

-  9.0 

1894  . 

1.6 

-  2.4 

5.70 

32.6 

-  2 

4 

- 

-  4.8 

1895  . 

.2.a„ 

-  2.0 

4.00 

33.7 

-   1 

1 

- 

-  2.0 

1896  . 

8.1 

+  4.1 

16.81 

41.7 

+  7 

49 

- 

-  28.7 

1897  . 

4.6 

+   .6 

.36 

34.3 

-  1 

1 

-   .6 

1898 

4.0 

4.2 

.0 

4-    .6 

■■:64 

37.4 
38.1 

+  2 
--  3 

4 

9 

1899  . 

- 

.6 

1900  . 

4.6 

.36 

42.6 

4-  8 

64 

- 

-  4.8 

1901  . 

2.7 

-  1.3 

1.69 

30.0 

-  5 

25 

- 

-  6.5 

1902  . 

4.7 

+  .7 

.49 

38.8 

+  4 

16 

-  2.8 

1903  . 

3.7 

-   .3 

.09 

31.5 

-  3 

9 

.9 

1904  . 

4.1 

+   .1 

.01 

32.8 

-  2 

4 

-    .2 

1905  . 

3.9   - 

-   .1 

.01 

37.9 

+  3 

9 

.3 

1906  . 

1907  . 

5.1 

-  1.1 

1.21 

42.2 

+  7 

49 

+  7.7 

5.4 
4.1 

+  1.4 
+   .1 

1.96 
.01 

34.8 
36.1 

0 

"i 

'■{ 

1908  . 

H 

h  1 

h":i 

1909  . 

3.8 

-   .2 

.04 

38.7 

- 

-  4 

16 

.8 

1910  . 

3.2 

-   .8 

.64 

36.6 

-  2 

4 

-  1.6 

1911  . 

2.4 

-  1.6 

2.56 

38.6 

- 

-  4 

16 

-  6.4 

1912  . 

5.7 

+  1.7 

2.89 

42.8 

- 

-  8 

64 

+  13.6 
-  3.6 

1913  . 

5.2 

+  1.2 
-30.2 

1.44 
112.07 

37.8 

- 

-  3 

9 

-125 

1258 

+201.4 

+34.1 

+  99 

-3.9 

- 

-  26 

404 


STATISTICAL   METHODS 


CORRELATION    BETWEEN  JULY  PRECIPITATION 
AND  YIELD  OF   CORN   IN  OHIO 


tt.  .     .. 

. 

• 

• 

<^ 

^e 

40 

.^-"^^^ 

• 

$ 

• 
• 

•       •             • 

• 

• 

•        •^-'"^ 

* 

in 

• 

♦     ^^,^.-'-'-''^ 

^ 

• 

•                   • 

lU 

• 

^** 

t 

I 

* 

• 

» 

V) 

^^^^ 

• 

-) 

30 

^^--"'''''^ 

• 

m 

V^ 

• 

• 

z 

• 

•  • 

>_ 

•                       • 

z 

• 

IT 

o 

o 

Xi. 

O 

JO 

O 

-J 

UJ 

* 

> 

Id 

1 

I 

1 

t \ 

\ 

2 

JULY    PRECIPITATION  IN  INCHES 

ABitlNC     or     RLLATION 

Figure  2. 


COMPARISON  — CORRELATION  405 


_J112:6Z_.0036  =V 

\    fin  \ 


60 
=  1.4 


Xxy-- 

=  201.4 

Xxy_ 
n 

^    60 
=     4.6 

•(f)(?) 

T  - 

201.4 

3.9-26 

1258      2 


60       60       60 
1.4X4.6 
3.36+. 03 


6.44 
=  0.526  ±E. 
l-r2 


Vn 
723 


£;^=±.674 

7.7 
=  ±  .063 
r= +0.526  ±.063 

Note  :  r  is  not  the  same  here  as  in  the  original  paper  be- 
cause a  single  average  yield  of  corn  has  been  used  for  sim- 
plicity. 

Explanation  of  Symbols 

n  number  of  observations  (years  of  record). 

Mx  true  mean  July  precipitation. 

M' X  some  arbitrary  number  near  Mx. 

My  true  mean  yield  of  corn. 

M' y  some  arbitrary  number  near  My. 

X  departure  of  each  July  precipitation  from  M'x. 

y  departure  of  yield  of  corn  in  each  year  from  M'y. 

Sx  algebraic  sum  of  departures  of  July  precipitation. 


406 


STATISTICAL   METHODS 


Zy  algebraic  sum  of  departures  of  yield  of  corn. 
Zx^  algebraic  sum  of  squares  of  departures  of  July  pre- 
cipitation. 

2y2  algebraic  sum  of  squares  of  departures  of  yield  of  corn. 
2xy  algebraic  sum  of  products  of  departures  (x  and  y). 
cTx  standard  deviation  of  July  precipitation 


a-y  standard  deviation  of  yield  of  corn. 


*   n      \n  J 


r  coeflBcient  of  correlation. 

2a;y_ 


r  = 


n 


ZxyZy 


n  J\  n 


(Txd 


X"  V 


Er  probable  error  of  the  coefficient  of  correlation. 


^,=  ±.674 


Vn 


Table   II.    Correlation  Tables   Showing   the   Relation   Be- 
tween July  Precipitation  and  the  Yield  of  Corn  in  Ohio 

(From  Smith,  J.  W.,  The  Effect  of  Weather  on  the  Yield  of  Corn,  Washing- 
ton, Mo.,  Weather  Rev.,  Vol.  42,  1914,  pp.  78-93.) 
Yield  of  Corn  in  Bushels  per  Acre 


July  Precipitation 
IN  Inches 

20.0  TO  24.9 

25.0  TO  29.9 

30.0  TO  34.9 

35.0  TO  39.9 

40.0  TO  44.9 

80-89     .     .     . 

1 

70-79 

60-69 

1 

1 

1 

50-59 

1 

7 

2 

40-49 

1 

2 

4 

8 

1 

30-39 

1 

8 

7 

20-29 

5 

5 

1 

1 

10-19 

1 

1 

COMPARISON  —  CORRELATION 


407 


2.    If  it  appears  from  this  examination  that  a  straight  line 
is  as  good  a  fit  as  any  other  type  of  curve  not  too  compUcated 

CORRELATION    BETWEEN  JULY     PRECIPITATION 
AND  YIELD   OF  CORN    IN    OHIO 


UNrTar=I.4-IN. 

AB  LINE  OF  RELATION 

CD  LINE  OF  RELATION   FOR  PERFECT  CORRELATION 

r  (COEFFICIENT  OF  CORRELATION)  =TAN<X'OB' 

Figure  3. 

to  be  useful  as  a  measure  of  relationship,  the  data  may  be 
replotted  on  a  new  dot  chart  for  which  the  unit  of  measure- 
ment on  one  axis  is  the  standard  deviation  of  one  of  the  varia- 


408  STATISTICAL  METHODS 

bles,  and  the  unit  on  the  other  axis  is  the  standard  deviation 
of  the  other  variable  (see  Figure  3). 

3.  The  position  of  the  straight  line  which  most  nearly 
satisfies  the  data  on  the  second  dot  chart  may  be  determined 
rigidly  by  the  method  of  least  squares.  When  the  standard 
deviation  of  one  variable  is  used  as  the  unit  of  the  ordinates 
and  the  standard  deviation  of  the  other  variable  as  the  unit 
of  the  abscissae,  the  angles  between  this  straight  line  of  closest 
fit  and  the  axis  are  significant.  If  these  angles  are  equal,  i.e. 
each  45°,  the  relationship  between  the  variables  is  perfect 
(see  C-D  in  Figure  3).  If  the  line  coincides  with  one  axis  or 
the  other  no  relationship  is  shown,  although  the  converse  is 
not  necessarily  true.^  Positions  between  these  two  show 
partial  relationship  (see  A'B'  in  Figure  3). 

4.  The  coefficient  of  correlation  is  merely  a  statement  of 
the  position  of  the  straight  line  of  closest  fit  on  a  chart  where 
the  units  are  the  standard  deviations  of  the  variables  as 
this  position  is  determined  by  the  least  square  adjustment.^ 
The  coefficient  of  correlation  is  expressed  as  the  tangent 
of  the  angle  made  by  the  line  of  closest  fit  and  the  axis  to 
which  it  is  more  nearly  parallel  (e.g.  angle  X'OB'  in  Figure  3 
is  27^°,  tan  X'OB' =  +0.526).  In  actual  practice  the  coeffi- 
cient of  correlation  ma}-  be  determined  mathematically  from 
the  data  as  showTi  in  Table  I  without  plotting  the  material 
on  a  dot  chart,  like  Figure  3.  However,  the  coefficient  should 
never  be  attempted  without  first  investigating  the  relation- 
ship far  enough  to  see  if  it  follows  a  straight  line.  That  is, 
steps  2  and  3  may  be  omitted  in  practice ;  step  1  should  never 
be  omitted. 

5.  If  the  examination  of  the  correlation  table  or  dot  chart 

1  Yule,  G.  U. :  Introduction  to  the  Theory  of  Statistics,  ed.  2,  London, 
Griffin  and  Co..  1912.  pp.  174-175. 
»  Ihid.,  p.  172. 


COMPARISON  —  CORRELATION 


409 


shows  that  the  relation  is  not  that  of  a  simple  straight  line, 
the  coefficient  of  correlation  is  not  a  measure  of  the  relation- 
ship between  the  variables. 

Limitations  of  the  Coefficient  of  Correlation 

It  is  clear  even  from  a  superficial  study  of  the  question  that 
the  coefficient  of  correlation  obtained  from  material  where  a 
straight  line  relationship  does  not  obtain  may  be  too  small, 

PREDICTED   HEIGHT  or  THE  HIGHER   HIGH    WATER   FOR   EACH   OAV   ATTER   NEW   MOON 


ITM     WLTCRENCE  TO    MLAN    MiCM    WATt»  fM  M  W),  AT   OLD    PQI  ^T  COMFO  BT  VA 
US  Coti'  oi'd  Otaa^f't  S>"-v/.   C0r>"«'   Tidf    Tpbits  for  f^g  yfgr  iS'6,fl/C3 


days  after  new  moon  -july  29,1916 
Figure  4. 


but  will  never  be  too  large. ^  A  coefficient  of  correlation  may 
be  near  zero  when  there  is  very  close  relationship,  as  is  shown 
in  such  a  condition  as  the  relationship  between  the  height 
of  high  water  and  the  phase  of  the  moon  which  is  shown  for 
Old  Point  Comfort,  Va.,  by  Table  III  and  Figure  4.  The 
figure  indicates  that  the  relation  is  harmonic ;  although  there 
is  a  close  and  very  definite  relation  between  the  phenomena, 
the  coefficient  of  correlation  is  near  zero  (  —  0.106  ±.088)  be- 
cause the  different  portions  of  the  curve  of  regression  are 
in  such  relations  to  each  other  that  a  straight  line  along  an 
axis  will  most  nearly  satisfy  all  the  points.  Of  course  the 
angle  is  then  zero  and  its  tangent  is  zero. 

1  See  Yule,  G.  U.,  "Introduction  to  the  Theory  of  Statistics,"  cd.  2, 
London,  Griffin  &  Co.,  1912,  p.  175,  and  Brown,  W.,  "The  Essentials  of 
Mental  Measurement,"  Cambridge,  University  Press,  1911,  pp.  27-59. 


410 


STATISTICAL   METHODS 


Table  III.  Correlation  of  Time  After  New  Moon  and  Pre- 
dicted Height  of  the  Higher  High  Water  at  Old  Point 
Comfort,  Va. 

(U.  S.  Coast  and  Geodetic  Survey,  General  Tide  Tables  for  the  Year  1916,  p.  103.) 


Days  After 

Height 

New  Moon, 

X 

x» 

Above 

V 

J/' 

xy 

July  29,  1916 

M.  L.  W. 

0 

-30 

900 

2.7 

-t-  .1 

.01 

-   3. 

1 

-29 

841 

2.6 

0 

2 

-28 

784 

2.6 

0 

3 

-27 

729 

2.5 

-  .1 

.01 

- 

2.7 

4 

-26 

676 

2.4 

-  .2 

.04 

- 

5.2 

6 

-25 

625 

2.4 

-  .2 

.04 

- 

■  5.0 

6 

-24 

576 

2.5 

-  .1 

.01 

- 

■  2.4 

7 

-23 

529 

2.5 

-  .1 

.01 

- 

-  2.3 

8 

-22 

484 

2.5 

-  .1 

.01 

- 

-  2.2 

9 

-21 

441 

2.6 

0 

10 

-20 

400 

2.7 

-  .1 

.01 

-  2.0 

11 

-19 

361 

2.8 

- 

-  .2 

.04 

-  3.8 

12 

-18 

324 

2.9 

- 

-  .3 

.09 

-  5.4 

13 

-17 

289 

3.0 

- 

-  .4 

.16 

-  6.8 

14 

-16 

256 

3.1 

- 

-  .5 

.25 

-  8.0 

15 

-15 

225 

3.1 

- 

-  .5 

.25 

-  7.5 

16 

-14 

196 

3.0 

- 

-  .4 

.16 

-  5.6 

17 

-13 

169 

2.9   1 

- 

-  .3 

.09 

-  3.9 

18 

-12 

144 

2.9 

- 

-  .3 

.09 

-  3.6 

19 

-11 

121 

2.9 

- 

-  .3 

.09 

-  3.3 

20 

-10 

100 

2.7 

- 

-  .1 

.01 

-  1.0 

21 

-  9 

81 

2.6 

0 

22 

-  8 

64 

2.5 

-  .1 

.01 

- 

-   .8 

23 

-  7 

49 

2.4 

-  .2 

.04 

- 

-  1.4 

24 

-  6 

36 

2.4 

-  .2 

.04 

- 

-  1.2 

25 

-  5 

25 

2.4 

-  .2 

.04 

- 

-  1.0 

26 

-  4 

16 

2.5 

-  .1 

.01 

- 

-  .4 

27 

-  3 

9 

2.5 

-  .1 

.01 

- 

-   .3 

28 

-  2 

4 

2.6 

0 

29 

-  1 

1 

2.6 

0 

30 

0 

0 

2.6 

0 

31 

-  1 

1 

2.5 

-  .1 

.01 

-   .1 

32 

- 

-  2 

4 

2.6 

0 

33 

- 

-  3 

9 

2.6 

0 

34 

- 

-  4 

16 

2.7 

--  .1 
--  .1 

.01 

--   .4 
-   .6 

35 

- 

-  5 

25 

2.7 

.01 

36 

- 

-  6 

36 

2.6 

0 

37 

- 

-  7 

49 

2.6 

0 

38 

- 

-  8 

64 

2.6 

0 

39 

- 

-  9 

81 

2.6 

0 

40 

- 

-10 

100 

2.7 

-  .1 

.01 

- 

-  1.0 

41 

- 

-11 

121 

2.8 

- 

-  .2 

.04 

- 

-  2.2 

42 

_ 

-12 

144 

2.9 

- 

-  .3 

.09 

- 

-  3.6 

43 

_ 

-13 

169 

2.9 

- 

-  .3 

.09 

- 

-  3.9 

44 

- 

-14 

196 

2.9 

- 

-  .3 

.09 

- 

-  4.2 

45 

_ 

-15 

225 

3.1 

- 

-  .5 

.25 

- 

-  7.5 

46 

- 

rl6 

256 

3.1 

_ 

-  .5 

.25 

- 

-  8.0 

47 

_ 

rl7 

289 

3.0 

_ 

-  .4 

.16 

- 

-  6.8 

48 

_ 

-18 

324 

2.9 

_ 

-  .3 

.09 

. 

-  5.4 

49 

_ 

hl9 

361 

2.7 

_ 

-  .1 

.01 

- 

-  1.9 

50 

_ 

-20 

400 

2.5 

-  .1 

.01 

-  2.0 

51 

_ 

-21 

441 

2.4 

-  .2 

.04 

-  4.2 

52 

J 

-22 

484 

2.3 

-  .3 

.09 

-  6.6 

63 

H 

1-23 

529 

2.2 

-  .3 

.16 

-  9.2 

54 

- 

-24 

576 

2.3 

-  .3 

.09 

-  7.2 

55 

- 

-25 

625 

2.3 

-  .3 

.09 

-  7.5 

66 

- 

-26 

676 

2.4 

-  .2 

.04 

5.2 

57 

- 

-27 

729 

2.4 

-  .2 

.04 

-  5.4 

58 

- 

-28 

784 

2.5 

-  .1 

.01 

-  2.8 

. 

t-29 

841 

2.6 

0 

60 

- 

-30 

900 

2.8 

^■  .2 

.04 

-H  6.0 

18910 

-3.9 

3.24 

-25.1 

-J-6.9 

-h3.0 

1 

COMPARISON  —  CORRELATION  411 

M^  =  30  M,  =  2.6+1^  =  2.65 

61 

2a:  =  0  2?/= +.05 


Sx''  =  18910  2i/  =  3.24 

^  /18910    ^  ^/3.; 


.003 


=  17.6  =.22 

Xxy=-25.l 

-25.1 


61 
r  = 


-0 


17.6  X.22 
.411 


3.87 
=  -.106±E, 

E.=.674l-(-iQ^)^ 
V61 

=  .674  ^--Qll^ 

7.8 

=  0.674x0.13 
r  =-0.106  ±0.088 

When  the  relation  is  not  linear  the  concomitant  variation 
may  be  shown  by  the  use  of  a  "correlation  ratio,"  which  is 
simply  a  further  development  of  the  theory  of  correlation.^ 

It  is,  however,  not  the  purpose  of  this  paper  to  consider 
relationships  shown  by  curves  of  a  higher  order  than  a 
straight  line,  as  such  correlations  involve  more  complicated 
mathematical  theory  and  also  require  many  more  observa- 
tions to  be  significant. 

1  See  Pearson,  K.,  "Mathematical  Contributions  to  the  Theory  o. 
Evolution,"  14,  on  the  general  theory  of  skew  correlation  and  non-linear 
regression.  London,  Drapers  Company  Research  Memoirs.  Biometric 
Series  2,  1905.  Brown.  W.,  "The  Essentials  of  Mental  Measurement," 
Cambridge,  University  Press,  1911,  pp.  57-59. 


412  STATISTICAL  METHODS 

Adequacy  of  the  Coefficient  of  Correlation 

The  conclusion  seems  legitimate  that  the  coefficient  of 
correlation  may  be  used  strictly  as  a  measure  of  relationship, 
when  such  relationship  has  been  determined  by  other  investi- 
gation to  follow  straight  line  relations.  The  use  of  the  coeffi- 
cient of  correlation  is  to  be  recommended  because  it  is  inde- 
pendent of  the  personal  equation  of  the  investigator,  and  of 
the  units  employed,  and  because  it  shows  rigidly  the  correct 
position  of  the  Une  indicated  by  the  dot  chart. 

In  using  the  coefficient  of  correlation  it  is  desirable  to  cal- , 
culate  the  probable  error  (see  Tables  I  and  III  for  method).^ 
The  probable  error  is  that  divergence  from  the  observed  mean 
on  either  side  within  which  half  the  observations  lie.  Its 
size  is  a  measure  of  how  closely  the  results  from  an  infinite 
number  of  cases  would  correspond  with  those  obtained  from 
the  observed  cases.  When  the  coefficient  of  correlation  is  not 
greater  than  its  probable  error  there  is  no  evidence  that  there 
is  any  correlation ;  but  when  the  coefficient  of  correlation  is 
clearly  greater  than  its  probable  error  correlation  is  indicated ; 
and  when  it  is  much  greater  (six  times  as  great  is  an  accepted 
empirical  amount)  it  may  be  safely  assumed  that  there  is 
concomitant  variation.^ 

The  coefficient  of  correlation  is  obtained  by  applying  the 
least  square  adjustment  to  all  the  material  and  is,  therefore, 
the  straight  line  of  closest  fit.  If  the  relationship  is  not  that 
of  a  straight  line,  it  is  obvious  that  the  straight  hne  of  closest 
fit  is  not  a  good  measure  of  the  relationship  and  that  some 
other  measure   (e.g.  the   correlation   ratio)  must   be  used. 

•  For  a  general  discussion  of  the  significance  of  probable  error  see  Yiile, 
G.  U.,  "Introduction  to  the  Theory  of  Statistics,"  ed.  2.  London,  Griffin  & 
Co.,  1912,  pp.  310-311. 

2  See  Bovvley,  A.  L..  "Elements  of  Statistics,"  ed.  3,  New  York,  Scribner, 
1907,  p.  320. 


COMPARISON  —  CORRELATION  413 

Therefore,  the  coefficient  of  correlation  should  never  be  used 
to  show  relationship  until  after  the  phenomena  have  been 
investigated,  at  least  far  enough  to  show  whether  a  straight 
line  satisfies  the  relationship  as  well  as  any  other  curve. 

Literature 

The  development  of  the  theory  of  correlation  resulting  in 
the  adoption  and  use  of  the  coefficient  of  correlation  is,  of 
course,  largely  mathematical.  While  the  Uterature  on  the 
subject  is  considerable,  the  greater  part  of  the  contributions 
are  concerned  with  the  application  of  the  coefficient  to  par- 
ticular problems,  and  hence  the  development  of  the  theory  of 
correlation  is  incidental  and  widely  scattered. 

"The  fundamental  theorems  of  correlation  were  for  the 
first  time  and  almost  exhaustively  discussed  by  A.  Bravais  ^ 
.  .  .  [more  than]  half  a  century  ago.  He  deals  completely 
with  the  correlation  of  two  and  three  variables.  Forty  years 
later  Mr.  J.  D.  Hamilton  Dickson  "^  dealt  with  a  special  prob- 
lem proposed  to  him  by  Mr.  Galton,  and  reached  on  a  some- 
what narrow  basis  some  of  Bravais'  results  for  correlation 
of  two  variables.  Mr.  Galton  at  the  same  time  introduced 
an  improved  notation  which  may  be  summed  up  in  the  'Gal- 
ton Function'  or  coefficient  of  correlation.  This  indeed  ap- 
pears in  Bravais'  work,  but  a  single  symbol  is  not  used  for  it. 
In  1892  Professor  Edgeworth,  also  unconscious  of  Bravais' 
memoir,  dealt  in  a  paper  on  '  Correlated  Averages '  with  cor- 
relation for  three  variables.^     He  obtained  results  identical 

1  Analyse  matMmatique  sur  les  probabilites  des  erreurs  de  situation  d'un 
point.  Paris,  Academie  des  Sciences,  Memoires  prescntes  par  divers  savants. 
Series  2,  Vol.  9,  1846,  pp.  255-332. 

2  Appendix  to  Galton,  F.,  "Family  Likeness  in  Stature,"  London,  Royal 
Society,  Proceedings,  Vol.  40,  1886,  pp.  63-73. 

»  London,  Philosophical  Magazine,  Series  5,  Vol.  34,  1892,  pp.  190-204. 


414  STATISTICAL   METHODS 

with  Bravais,  although  expressed  in  terms  of  '  Galton's  func- 
tions.'" ^ 

The  following  publications  contain  complete  statements  of 
the  later  development : 

Pearson,  Karl:  Contributions  to  the  mathematical  theory  of 
evolution  ;  London,  Royal  Society,  Philosophical  Transactions, 
Series  A,  as  follows  : 

1.  On    the    dissection    of    frequency   curves,    Vol.    185,    1894, 

pp.  71-110. 

2.  Skew  variations  in  homogeneous  material.  Vol.   186,  1895, 

pp.  34:^-414. 

3.  Regression,  heredity,  and  panmixia,  Vol.  187,  1896,  pp.  253- 

318. 

4.  On  the  probable  errors  of  frequency  constants  and  on  the 

influence  of  random  selection  on  variation  and  correlation, 
Vol.  191,  1898,  pp.  229-311. 

5.  On  the  reconstruction  of  the  stature  of  prehistoric  races. 

Vol.  192,  1898,  pp.  169-244. 

6.  Genetic  (reproductive)  .selection;    inheritance  of  fertility  in 

man  and  of  fecundity  in  thoroughbred  race  horses.  Vol. 
192,  1899,  pp.  257-330. 

7.  On  the  correlation  of  characters  not  quantitatively  measur- 

able, Vol.  195,  1900,  pp.  1-47. 

8.  On  the  inheritance  of  characters  not  quantitatively  measur- 

able. Vol.  195,  1900,  pp.  75-150. 

9.  On  the  principle  of  homotyposis  and  its  relation  to  heredity, 

to  the  variability  of  the  individual,  and  to  that  of  the 
race.  Vol.  197,  1901,  pp.  28.5-379. 

10.  Supplement  to  a  memoir  on  skew  variation,  Vol.  197,  1901, 

pp.  443-459. 

11.  On  the  influence  of  natural  selection  on  the  variability  and 

correlation  of  organs.  Vol.  200,  1902,  pp.  1-66. 

12.  On   a   generalized    theory   of   alternative   inheritance   with 

special  reference  to  Mendel's  Laws,  Vol.  203,  1904,  pp. 
53-86. 

'  Pearson,  Karl,  Loiidon  Royal  Society  Philosophical  Tratisactions,  Series  A, 
Vol.  187,  1896,  p.  261. 


COMPARISON  —  CORRELATION  415 

In  London,  Drapers'  Company  Research  Memoirs,  Biometrie  Series. 

13.  On  the  theory  of  contingency  and  its  relation  to  association 

and  normal  correlation.     Memoir  1. 

14.  On  the  general  theory  of  skew  correlation  and  non-linear 

regression.     Memoir  2. 

15.  On  the  mathematical  theory  of  random  migration.     Memoir 

3,  1906. 

16.  On  further  methods  of  determining  correlation.     Memoir  4, 

1907. 

17.  [Not  published.] 

18.  On   a   novel   method   of  regarding   the   association   of   two 

variates  classed  solely  in  alternate  categories.     Memoir  7, 
1912. 
Pearson,    Karl  :      On    the    partial    correlation    ratio.      London, 

Royal  Society,  Proceedings,  Series  A,  Vol.  91,  1915,  pp.  492-498. 
Brown,  W.  :     The  essentials  of  mental  measurement,  Cambridge, 

University  Press,  1911. 
Elderton,  W.  p.  :     Frequency  curves  and  correlation.     London, 

Layton  Brothers,  1906. 
Hooker,   R.   H.  :     Correlation  of  successive  observations.   Royal 

Statistical  Society  Journal,  Vol.  68,  pp.  676-703. 
ToLLEY,  H.  R. :  The  theory  of  correlation  as  applied  to  farm  survey 

data  on  fattening  baby  beef,  U.  S.  Department  of  Agriculture 

Bui.  504,  Washington,  Govt.  Ptg.  Office,  1917. 
Walker,  Gilbert  T.  :  Correlation  in  seasonal  variation  of  weather, 

Indian  Meteorological   Department    Memoirs,    Simla,    1909- 

1915. 

1.  Correlation  in  seasonal  variation  of  climate.  Vol.  20,  part  6, 

1909,  pp.  117-124. 

2.  (A)  On  the  probable  error  of  a  coefficient  of  correlation  with 

a  group  of  factors. 
(B)  Some   applications   of  statistical   methods   to  seasonal 
forecasting,  Vol.  21,  part  2,  1910,  pp.  22-45. 

3.  On  the  criterion  for  the  reality  of  relationships  or  periodici- 

ties, Vol.  21,  part  9,  1914,  pp.  13-16. 

4.  Sunspots  and  rainfall.  Vol.  21,  part  10,  1915,  pp.  17-60. 

5.  Sunspots  and  temperature,  Vol.  21,  part  11,  1915,  pp.  61-90. 

6.  Sunspots  and  pressure,  Vol.  21,  part  12,  1915,  pp.  91-118. 


416  STATISTICAL   METHODS 

Yule,  G.  Udny  :     Introduction  to  the  theory  of  statistics,  ed.  2, 
London,  C.  Griffin  &  Co.,  1912,  pp.  157-253. 

More  elementary  discussions  are  contained  in  the  following 
papers : 

Persons,  W.  M.  :  The  correlation  of  economic  statistics.  Boston, 
American  Statistical  Association,  Quarterly  Publications,  Vol. 
12  (1910),  pp.  287-322. 

Hooker,  R.  H.  :  An  elementary  explanation  of  correlation :  illus- 
trated by  rainfall  and  the  depth  of  water  in  a  well ;  London, 
Royal  Meteorological  Society  Quarterly  Journal,  Vol.  34,  1908, 
pp.  277-291. 

Elderton,  W.  p.  and  E.  M. :  Primer  of  statistics,  London,  A.  and 
C.  Black,  1910,  pp.  55-72. 

King,  W.  I. :  Elements  of  statistical  method,  New  York,  Macmillan, 
1912,  pp.  197-215. 

Dines,  W.  H.  :    The  practical  application  of  statistical  methods  to 
meteorology.     London,  H.  M.  Meteorological  Office,  The  com- 
puter's handbook  (M.  O.  223),  section  5,  part  2,  1915,  pp.  V29- 
V52. 
The  most  complete  bibliographies  will  be  found  in  : 

Yule,  G.  Udny  :  Introduction  to  the  theory  of  statistics,  London, 
C.  Griffin  &  Co.,  1912,  pp.  188,  208-209,  225-226,  and  252. 

Davenport,  C.  B.  :  Statistical  methods  with  special  reference  to 
biological  variation,  third,  revised  edition.  New  York,  J.  Wiley 
&  Sons,  1914,  pp.  62  and  85-104. 

Statistical  Standards  in  the  Interpretation  of 

Facts  ^ 

Given  a  related  group  of  statistical  facts,  having  been  col- 
lected, tabulated,  and  graphically  expressed,  to  what  stand- 
ards must  an  interpretation  of  them  conform?  To  fail  to 
attach  meaning  and  significance  to  them  is  simply  to  accen- 
tuate the  all  too  prevailing  practice  of  leaving  untranslated 

•  Adapted  from  Secrist,  Horace,  "Statistical  Standards  in  Business 
Research,"  Quarterly  Publications,  American  Statistical  Association,  March, 
1920,  pp.  55-57. 


COMPARISON  —  CORRELATION  417 

into  standards  and  principles   the  myriads  of  facts  daily- 
growing  out  of,  or  experienced  in,  human  relations. 

Certain  fundamental  standards  of  interpretation  are  the 
following : 

First.  The  truth  is  the  end  sought:  error  is  not  to  be 
disguised,  falsehood  tolerated,  nor  preconceptions  favored. 

Second.  Comparisons  can  be  made  only  between  things, 
conditions,  times,  and  places  having  common  qualities. 

Third.  In  interpretation,  facts  must  always  be  referred 
to  conditions  which  can  produce  them. 

Fourth.  Interpretation  should  extend  to  an  explanation 
of  the  past  and  a  forecast  of  the  future. 

Fifth.  Distinction  should  be  made  between  long-  and 
short-time  conditions  and  consequences;  between  transi- 
tory skirmishes  and  general  tendencies. 

Sixth.  Distinction  should  be  made  between  the  result 
of  a  single  cause  and  a  combination  of  causes. 

Seventh.  Distinction  should  be  made  between  drawing  a 
particular  deduction  and  giving  it  general  appHcation. 

Eighth.  Similarities  and  differences  should  be  appraised 
in  the  light  of  particular  application.  Similarities  which 
are  seemingly  complete  and  differences  which  are  funda- 
mental for  one  purpose  may  be  ignored  for  others. 

Ninth.  The  detail  of  interpretation  should  conform  to 
the  nature  of  the  problem  and  the  capacity  of  those  interested. 
Not  infrequently  an  exaggerated  accuracy,  which  the  nature 
of  the  basic  data  does  not  justify,  nor  the  occasion  for  sum- 
marizing warrant,  is  worked  out  in  detail  by  means  of  per- 
centages, averages,  and  other  summary  expressions.  Sim- 
ilarly, far-reaching  conclusions  are  sometimes  drawn  from 
inadequate  data  by  elaborate  and  overrefined  methods. 
Statistical  analysis  then  appears  as  an  inverted  and  unstable 
pyramid. 

2e 


418  STATISTICAL   METHODS 

Likewise,  involved  and  complex  interpretations  are  some- 
times prepared  for  those  who  are  statistically  ignorant  of 
refined  processes  or  for  those  who  are  disinclined  to  follow 
or  uninterested  in  pursuing  an  elaborate  analysis.  A  statis- 
tical interpretation  designed  to  influence  executive  action  or 
to  enlist  administrative  support  is  rarely,  if  ever,  to  be  couched 
in  the  same  language  or  to  include  the  same  detail,  as  one 
which  is  intended  to  serve  the  simple  purpose  of  record.  Con- 
supiers  of  statistics  not  only  differ  in  their  statistical  interests 
but  also  in  their  statistical  horizons. 

REVIEW   PROBLEMS 

Given  the  following  data  showing  the  annual  outlay  and  value 
of  product  realized  by  51  farmers  living  near  Dallas,  Wisconsin, 
determine : 

1.  The  coefficient  of  correlation  and  its  probable  error  for  outlay 
and  value  of  product.  Record  all  the  steps  in  the  process  and  all 
significant  figures. 

2.  Given  the  data  on  page  420,  showing  the  value  of  feed  consumed 
and  product  produced  by  26  registered  cows  of  the  same  breed  and 
under  the  same  management,  determine  by  the  direct  method  for 
the  two  series,  the  coefficient  of  correlation  and  its  probable  error. 
Carefully  record  each  step  in  the  process  and  include  in  your  pres- 
entation of  method  all  significant  figures.  Use  the  nearest  whole 
numbers  —  dollars  —  in  all  instances.  (The  arrangement  of  similar 
material  in  chapter  12  of  the  Text  may  be  taken  as  a  guide.) 

What  does  the  coefficient  seem  to  show?  Do  you  regard  the 
data  as  adequate?  Why?  Is  the  coefficient  significant  according 
to  the  rule  established  by  Bowley  ? 


COMPARISON  —  CORRELATION 


419 


Annual   Outlay    and   Total  Value  of  Product  on  Fifty-one 
Farms  near  Dallas,  Wisconsin  ' 


Annual  Outlay 

Value  of  Product 

Annual  Outlay 

Value  of  Product 

$  421 

$1285 

$  563 

$  962 

932 

2649 

620 

1015 

434 

1143 

1392 

2259 

293 

727 

715 

1146 

333 

799 

1165 

1868 

1683 

3644 

885 

1410 

1334 

2844 

764 

1162 

775 

1646 

1173 

1778 

1026 

2165 

440 

686 

1379 

2895 

1595 

2358 

1344 

2533 

1090 

1602 

961 

2018 

978 

1435 

1675 

3473 

1595 

2165 

1203 

2472 

1358 

1878 

1734 

3619 

1703 

2339 

983 

2000 

1018 

1309 

395 

749 

1505 

1898 

1618 

3016 

1492 

1853 

739 

1361 

1211 

1496 

881 

1610 

1103 

1320 

1266 

2307 

1095 

1219 

1124 

1963 

932 

1009 

1695 

2909 

1263 

1348 

1278 

2192 

742 

759 

894 

1522 

804 

713 

1469 

1131 

1  Data  furnished  by  Professor  H.  C.  Taylor,  the  University  of  Wisconsin. 


420 


STATISTICAL   METHODS 


Value  of  Feed  Consumed  and  Value  of  Product  per  Cow  op 
26  Registered  Cows  of  the  Same  Breed  Under  the  Same 
Management.' 


Value  of  Feed 

Value  of  Product 

Value  of  Feed 

Value  of  Product 

Consumed 

PER  Cow 

Consumed 

per  Cow 

$99.83 

$246.10 

$98.93 

$174.64 

86.42 

207.76 

82.69 

143.61 

91.05 

216.52 

82.94 

143.18 

94.05 

220.01 

87.03 

150.02 

94.06 

214.87 

89.07 

153.51 

86.06 

183.53 

83.52 

143.61 

84.20 

176.39 

83.10 

140.46 

86.70 

178.56 

89.16 

150.68 

86.75 

178.11 

83.01 

136.60 

86.57 

166.70 

89.32 

145.41 

88.52 

169.20 

82.22 

131.35 

94.01 

179.25 

99.74 

157.28 

86.23 

157.20 

84.77 

122.22 

'  Data  furnished  by  Professor  H.  C.  Taylor,  the  University  of  Wisconsin. 


INDEX 


Accident,  definition  of  a  tabulatable, 
165-166;  meaning  of  an,  165; 
test  of  seriousness  of  an,  163. 

Accident  frequency  rates,  meaning 
of,  167-169. 

Accident  rates,  meaning  of,  166-167. 

Accident  severity  rates,  169-184 ; 
meaning  of,  169. 

Accident  statistics,  purposes  of,  161- 
162. 

Accidents,  public  utility  statistics  of, 
161-164 ;  rates  of  industrial,  164- 
184 ;  statement  of,  as  ratios,  163- 
164. 

Accuracy,  141-147;  crop  reports 
and,  86-90 ;  degrees  of,  in  measure- 
ments of  logs,  91-95 ;  editing  of 
schedules  for,  229-232;  relative 
nature  of,  in  graphic  presentation, 
277  ;  relativity  of,  96-97,  158-159. 

Accuracy  of  death  certificates,  141- 
147. 

Advertising,  statistical  basis  for, 
38^6. 

Arithmetic  mean,  nature  of,  371. 
(See  Average.) 

Average,  car  mileage  as  an,  343 ; 
car-seat  mile  as  an,  344-347 ; 
the  median  as  an,  325-326 ;  the 
meaning  and  limitations  of  an, 
318-319;  use  of  weighted,  in 
crop  reporting,  329-331. 

Average  tariff  duty,  calculations  of 
the,  334-341. 

Averages,  the  "normal"  in  crop 
reporting  and,  82-84 ;  law  of, 
331-334;  law  of,  explained,  117- 
118;  misuse  of,  190;  the  quar- 
tiles  as,  326  ;  use  of  law  of,  applied 
to    advertising    and    selling,    118- 


123 ;  use  of  law  of,  applied  to  the 
determination  of  price  policies, 
123-124 ;  use  of,  in  presenting 
wage  statistics,  318-329 ;  use  of, 
to  measure  street-car  utilization, 
341-344. 

Balanced  testimony,  a  method  of 
securing  accuracy,  104-110. 

Bars,  use  of,  274. 

Base  line,  absence  of  a,  in  logarithmic 
diagrams,  296-297. 

Base  lines,  274.     (See  Diagrams.) 

Bias,  144-147;  error  and,  331-332. 

Biased  error  and  estimates  of  crop 
acreage,  75-78. 

Bureau  of  Crop  Estimates,  method 
used  by,  in  computing  index 
numbers,  350-354. 

Business,  errors  of  use  in  statistics 
of,  28-29 ;  planning  in,  by  use  of 
statistics,  27-29  ;  practical  objects 
of  statistics  in,  26-31 ;  statistics 
in,  25-32 ;  statistics  of  internal, 
28-30;  use  and  application  of 
statistics  in,  23. 

Business  cycles,  statistical  analysis 
of,  35-37. 

Caption  headings,  relation  of  the 
stub  to,  246. 

Causation,  major  and  minor  causes 
and,  374-377;  the  statistical 
method  and,   374-378. 

Charts,  use  of,  in  commercial  re- 
search, 43-44.     (-See  Diagrams.) 

Classification  of  facts  and  science,  6. 

Classification,  relation  of,  to  tabu- 
lation, 269 ;  tabular  presentation 
and,  242-272. 


421 


422 


INDEX 


Coefficient,  accident  severity  rate 
as  a,  169-184 ;  necessary  char- 
acteristics of  a,  189-190. 

Coefficients,  344-347  ;  accident  fre- 
quency rates  as,  167-169 ;  as 
ratios,  163-164 ;  industrial  acci- 
dent rates  as,  164-184  ;  use  of,  in 
statistics  of  accidents,   163-164. 

Coefficients  of  correlation,  400-416. 
(See  Correlation.) 

Collection  of  crop  reports,  use  of  mail 
carriers  for,  79. 

Collection  of  data,  methods  used  in 
study  of  standing  timber,  101-110. 

Collection  of  statistics,  standards  in, 
148-149. 

Commercial  research,  questions  to 
be  answered  by,  40-41  ;  the  func- 
tion of,  39-46. 

Comparison,  difficulties  of  inter- 
national, of  wages,  398-400 ;  statis- 
tical, 397  ;    correlation,  396-420. 

Compensating  errors  and  balanced 
testimony,  104-110. 

Component-part  diagrams,  275. 

Correlation,  396-420 ;  defined,  401 ; 
statistics  and,  371-372;  symbols 
in  computation  of  the,  formula, 
403,  405-i06;  the  coefficient  of, 
400-416;  the  graphic  method  as  a 
measurement  of,  400. 

Correlation  coefficient,  adequacy  of, 
412-413;  defined,  408;  limits  of 
the,  409-411;  literature  on  the, 
413-416;  method  of  calculation 
illustrated,  403-409.  (See  Coeffi- 
cient of  Correlation.) 

Correlation  table,  402. 

Cost  accounting  and  statistics,  31- 
32. 

Counting  as  an  alternative  to  an 
estimate,  95-101. 

Crises,  statistical  study  of,  36-37. 

Crop  estimates,  value  of,  64-69. 

Crop  reporting,  accuracy  of,  86-90; 
methods  of,  69-71 ;  use  of  weighted 
averages  in,  329-331.  (jSee  Bureau 
of  Crop  Estimates.) 

Crop  reports,  64-90 ;    preparation  of. 


72-74  ;  scope  of  the  governments, 
69  ;  transmission  of,  to  the  govern- 
ment, 71  72. 

Crops,  estimates  of,  72-74  ;  estimates 
of  acreage  of,  75-78. 

Curves,  justification  of  smoothing, 
280-282;  object  of  smoothing, 
279-280;  theory  and  justification 
of  smoothing  of,  278-282. 

Derivative  tables,  defined,  253. 

Diagrammatic  presentation,  rules  for, 
273-276. 

Diagrams,  base  lines  in,  274 ;  com- 
ponent-part, 275 ;  measurement 
of  slopes  on  logarithmic,  298-300; 
positions  of  bars  in,  275 ;  position 
of  titles  in,  274 ;  properties  of 
logarithmic,  288-297;  rules  for 
plotting  frequency,  275 ;  geo- 
graphic variations  in,  275-276 ; 
time  variations  in,  276 ;  the 
horizontal  zero  in  frequency,  385- 
394;  use  of  bars  in,  274;  lines 
in,  275 ;  logarithmic  scale  in, 
287-288. 

Difference-scale,  use  of,  in  graphics, 
283-285. 

Discrete  series,  curve  smoothing  and, 
279. 

Dispersion,  coefficient  of,  387 ; 
graphic  representation  of,  387- 
393  ;  measures  of,  387  ;  nature  of, 
386-387. 

Distribution,  method  of,  determined 
by  research,  42-43. 

Earnings,  computation  of,  208-209 ; 
definition  of,  192 ;  relation  of 
strikes  to,  192 ;  relation  of  un- 
employment to,  192 ;  wages  and, 
398. 

Editing,  accuracy  in,  229-232; 
corrective  character  of,  229 ;  for- 
mal character  of,  229 ;  reasons  for, 
229  ffi. ;  relation  of,  to  tabulation, 
229. 

Editing  of  schedules,  229-236;  for 
completeness,    235-236;     for   con- 


INDEX 


423 


sistency,  230,  232-234;  for  uni- 
formity, 234-235. 

Error,  141-147  ;  bias  and,  331-332  ; 
definition  and  illustration  of  the 
probable,  381-383 ;  effect  of  in- 
creasing the  number  of  samples 
on,  333-334 ;  estimate  of  acreage 
yields  of  crops  and,  77-78 ;  esti- 
mates of  crop  acreage  and,  75-78 ; 
estimates  of  livestock  and,  78. 

Errors,  compensating,  illustrated, 
331-334 ;  compensation  of,  98- 
100 ;  in  statistics  of  unemploy- 
ment, 47-57 ;  in  use  of  business 
statistics,  28-29. 

Estimates,  methods  of,  in  timber 
measurements,  95-101 ;  nature  of 
timber,  91-110. 

Estimates  of  acreage,  by  sampling, 
79-80. 

Estimates  of  acreage  yield,  77-78. 

Estimates  of  livestock,  78. 

Factory  output,  measures  of,  126- 
128;    sources  of  data  on,  137-140. 

Facts,  classification  of,  and  science, 
6. 

"Fatal"  accidents,  how  determined, 
162. 

Frequency  diagrams,  purpose  of, 
386 ;  the  horizontal  zero  in,  385- 
394 ;  types  of,  in  which  horizontal 
zero  cannot  be  shown,  393.  {See 
Diagrams.) 

Frequency  series,  essential  facts 
concerning,  386. 

Geometric  mean,  use  of  the,  in  stock 
index  numbers,  365-366. 

Graphic  forms,  choice  of,  274-276. 

Graphic  method,  as  a  measure  of 
correlation,  400 ;  limitations  of 
the,  282-283  ;  nature  of  the,  282 ; 
purposes  of  the,  386. 

Graphic  presentation,  rules  for,  273- 
276 ;  standards  and  rules  for, 
contrasted,  277  ;  statistical  stand- 
ards in,  276-277. 

Graphics,   limitations  of  the  natural 


.scale  in,  283-285 ;  logarithmic 
scale  in,  282-305 ;  use  of,  in 
commercial  research,  43-44.  {See 
Logarithmic  Diagrams.) 

Group  facts  vs.  unit  facts,  21. 

Groups,  attributes  of,  369-372; 
statistics  gives  knowledge  of  com- 
position of,  369-370;  use  of,  in 
tabulating  wages,  319-323. 

Homogeneity,  units  and,  151-154. 

Index  numbers,  bases  for  weighting, 
in  stock,  361-364 ;  computation  of, 
by  the  Bureau  of  Crop  Estimates, 
350-354;  "general-purpose,"  con- 
trasted with  "specific-purpose," 
359 ;  plotting  of,  on  logarithmic 
diagrams,  302-304 ;  steps  in  com- 
puting, 351-354;  stock  and  com- 
modity, contrasted,  355-357 ;  uses 
of  stock,  357-359 ;  weighting 
stock,  360-364. 

Index  numbers  of  stock,  limitation 
of  the  "chain"  type,  365-366; 
method  of  computing,  and  the 
purpose  of,  364-365 ;  use  of  the 
geometric  mean  in,  365-366. 

Index  numbers  of  stock  prices, 
354  367. 

Industries,  bases  of  grouping  of,  195- 
197. 

Injury,  as  a  statistical  unit,  161. 

Interpretation,  statistical  standards 
of,  416-418. 

"  Lalioratory  "  method  in  advertising 
policies,  121-123. 

Labor  turnover,  and  unit  measure- 
ment, 24. 

Large  numbers,  the  logic  of,  331-334. 

Linear  correlation,  how  shown,  402- 
404. 

Log  scales,  use  of,  and  accuracy, 
91  95. 

Logarithmic  diagrams,  measurements 
of  slopes  on,  298-300 ;  properties 
of,  288-297 ;  use  of,  for  comparing 
large    and    small    quantities,    300- 


424 


INDEX 


302 ;  use  of,  for  plotting  index 
numbers,  302-304. 
Logarithmic  scale,  advantages  of 
the,  282-305;  defined,  285; 
mathematical  principle  of  the, 
illustrated,  285-286;  use  of  the, 
in  diagrams,  287-288. 

Maps,  rules  for  drawing  statistical, 

275-276. 
"Market,"     statistical     aspects     of 

the,  111-113. 
Market  contour,  explained,  112-113. 
Market  development,    study   of,    by 

sampling,  111-124. 
Market      distribution,       choice      of 

methods    in,    determined    statisti- 
cally, 113-123. 
Market  strata,  price  policies  and,  112. 
Market    surveys,    questions    to    be 

asked  in,  44—45 ;    to  be  made  by 

whom,  41-42. 
Markets,  statistical  study  of,  38^6. 
Measurement     of     factory     output, 

conditions  necessary  to  the,    129- 

137. 
Measurements,      characteristics      of, 

units  in  statistical,  150-159. 
Measurements  of  logs,   accuracy  of, 

91-95. 
Median,   defined,   325-326;     graphic 

presentation     of     the,      387-389 ; 

limitations  of  the  use  of  the,  327- 

329. 
Method,  causation  and  the  statistical, 

374-378. 

"Normal,"  actual  j-ield  in  crop 
reporting  and  the,  85-86  ;  averages 
in  crop  reporting  and  the,  82-84 ; 
criticism  of  use  of  the,  81-82 ; 
the,  in  crop  reporting,  80-86. 

Numbers,  rounding  of,  in  derivative 
tables,  255 ;  rounding  of,  in  tables, 
267 ;  rounding  of,  in  tabulation, 
255-257. 

PajTolls,  as  a  source  of  wage  data, 
197-198,  199-201. 


Percentages,  use  of  cumulative,  in 
wage  studies,  323-325. 

Probable  error,  correlation  coeffi- 
cient and  the,  412-413 ;  defined, 
412;  defined  and  illustrated,  381- 
383. 

Production,  statistical  series  on, 
59-61. 

Quartiles,  defined,  326 ;  limitations 
of  the  use  of,  327-329. 

Questionnaire,  illustration  of  a,  236- 
238,  239,  240;  points  to  be  con- 
sidered in  the  use  and  form  of  a, 
224-229.     {See  Schedules.) 

Rates,  industrial  accident,  164-184 ; 
meaning  of  accident,  166-167 ; 
meaning  of  accident  frequency, 
167-169 ;  basis  for  computation 
of  wage,  205-208. 

Ratio,  car-seat  mile  as  a,  344-347; 
the  coefficient  of  dispersion  as  a, 
387. 

Ratios,  industrial  accidents  ex- 
pressed as,  164-184  ;  rounding  of, 
256 ;  as  coefficients,  163-164. 

Relativity,  units  and,  157-158. 

Research,  questions  answered  by 
commercial,  40-41. 

Salaries,  as  a  statistical  unit,  24. 

Salesmen  in  market  surveys,  41. 

Samples,  industrial,  in  wage  studies, 
193,  198-199. 

SampUng,  acreage  estimates  and,. 
79-80;  geographical,  in  wage 
studies,  193 ;  method  of,  in  com- 
mercial research,  44 ;  method  of, 
in  market  development,  111-124; 
of  coal,  62-64 ;  use  of,  in  timber 
estimates,  96-101 ;  use  of,  method 
in  testing  markets,  119-121.  (See 
Estimates,  Method  of.) 

Scale,  advantages  of  the  logarithmic, 
282-305 ;  logarithmic,  defined, 
285  ;  logarithmic,  illustrated,  286  ; 
use  of  logarithmic,  in  diagrams, 
287-288 ;    use   of   the  natural,   in 


INDEX 


425 


diagrams,  283-285;  zeros  in  the, 
276. 

Scale  units,  273. 

Schedules,  illustrations  of,  236-238, 
239,  240 ;  type  of,  used  in  wage 
study,  194;  editing  of,  229-236; 
editing  of,  for  accuracy,  229-232  • 
editing  of,  for  consistency,  230, 
232-234 ;  editing  of,  for  complete- 
ness, 235-236 ;  editing  of,  correc- 
tive, 229 ;  editing  of,  for  uni- 
formity, 234-235;  editing  of, 
formal,  229 ;  points  to  be  con- 
sidered in  the  use  and  form  of, 
224-229 ;    tabulation  from,  249. 

Science,  citizenship  and,  5-6 ;  classi- 
fication of  facts  and,  6 ;  essence 
of,  6 ;  essentials  of  good,  8  ffl. ; 
method  and,  8 ;  need  for  appre- 
ciation of,  2-5 ;  the  function  of, 
6 ;  the  scope  of,  10  fH. ;  unity  of, 
is  in  its  method,  10. 

Scientific  method,  citizenship  and, 
7-8 ;  general  application  of,  6 ; 
in  analysis  of  business  cycles,  35- 
37. 

Series,  comparison  of  time,  29-30 ; 
time,  and  tabulation,  203-204 ; 
measure  of  variability  of  a,  387 ; 
smoothing  of  continuous,  279 ; 
smoothing  of  discrete,  279 ;  statis- 
tical, of  production,  59-61. 

Severity,  measure  of,  in  accident 
statistics,  170-181. 

Severity  rates,  illustrations  of  uses 
of,  177-184. 

Smoothing  curves,  justification  of, 
280-282  ;    object  of,  279-280. 

Standardization  of  statistical  tables, 
259-268. 

Standards,  interpretation  of  facts 
and  statistical,  416-418;  statis- 
tical, in  tabulation,  269-270 ;  use 
of,  in  graphic  presentation,  276- 
277. 

Statistical  department  in  business, 
32-33. 

Statistical  investigation,  stages  in, 
24. 


Statistical  knowledge,  nature  of, 
369-384. 

Statistical  method,  causation  and, 
374-378 ;  essentials  of,  15 ;  func- 
tion of,  summarized,  384 ;  a 
knowledge  of  determinative  causes 
and,  380-381 ;  position  of,  in  the 
sciences  not  independent,  384 ; 
results  of,  summarized,  372 ;  vs. 
the  a  priori,  115  fH. ;  use  of,  for 
prediction,  372-384  ;  uses  of,  369  ; 
content  of,  23-24. 

Statistical  probabilities,  379-384. 

Statistical  standards,  in  the  inter- 
pretation of  facts,  416-418;  in 
tabulation,  269-270;  in  graphic 
presentation,  276-277. 

Statistical  tables,  definition  of,  247 ; 
use  of,  244-247. 

Statistical  units,  homogeneity  of, 
24-25. 

Statistician,  qualifications  of  a,  18- 
19. 

Statistics,  as  master  facts,  22 ;  bear- 
ing of,  on  the  railroad  problem, 
17-18 ;  business  planning  by  use 
of,  27-29 ;  cooperation  in  the 
development  of,  210-224;  cost 
accounting  and,  31-32  ;  definition 
of,  22,  33,  243 ;  description  of  a 
market  by,  111-113;  doubt  as  to 
meaning  of,  14-15 ;  errors  in  use 
of  business,  28-29 ;  establishment 
of  cause  and  effect  relations  by 
use  of,  16-17 ;  general  purpose, 
14 ;  importance  of,  in  business, 
33-34 ;  interpretation  of,  15 ; 
knowledge  which,  gives,  369- 
372 ;  limits  of,  396 ;  nature  and 
purpose  of,  in  business,  22 ;  part 
played  by,  in  modern  problems, 
14 ;  relation  of,  to  groups,  369- 
371 ;  .series  of  production,  59-61 ; 
source  of,  on  shipping,  214-218; 
use  of,  as  a  means  of  control,  212- 
214;  use  of,  for  planning  purposes, 
210-224  ;  use  of,  in  controUing  pur- 
chases, 21  ;  use  of,  in  locating  retail 
stores,  20-21 ;   use  of,  to  determine 


426 


INDEX 


method  of  market  distribution, 
113-123. 

Statistics  in  business,  20-34 ;  prac- 
tical objects  of,  26-31. 

Statistics  of  accidents,  purposes  of, 
161-162. 

Statistics  of  unemployment,  47-57 ; 
conclusions  to  be  drawn  from,  55-57. 

Strikes,  relation  of  earnings  to,  192. 

Stub,  function  of  the,  in  tables,  244- 
245,  246 ;  order  of  details  in  the, 
244-245 ;  relation  of  the,  to  cap- 
tion headings,  246  ;  relation  of  the, 
to  classification,  246-247 ;  use  of, 
in  derivative  tables,  245-246. 

Swift  and  Company,  commercial  re- 
search department  of,  42-43. 

Table,  definition  of  a  statistical, 
247 ;  purpose  of  a  statistical, 
243,  246  ;  statistical,  defined,  242- 
243.      (See  Tabulation.) 

Tables,  advantages  of,  243-244 ; 
definition  of  general,  253 ;  deriv- 
ative, and  comparability,  251- 
252 ;  general,  contrasted  with 
derivative,  253-255 ;  nature  of 
general-purpose,  261-264 ;  nature 
of  the  special-purpose,  261,  264- 
266  ;  necessity  of  analysis  of,  254- 
255 ;  numbering  of,  253 ;  order 
of  details  in,  263-264,  266;  posi- 
tions of  totals  in,  266 ;  purpose 
of  the  columns  in,  261-266; 
purpose  of  the  rows  in,  261-266 ; 
relation  of  caption  headings  to, 
246 ;  rounding  of  numbers  in, 
255-257 ;  rules  for  constructing 
statistical,  244 ;  standardization 
of  the  construction  of,  259-268 ; 
stub  and  caption  items  in,  262- 
263;  the  stub  in  statistical,  244- 
245 ;  use  of  samples  in,  251-252  ; 
use  of  statistical,  244-247. 

Tabular  forms,  standards  in  the 
construction  of,  261-268. 

Tabular   notation,   257-258. 

Tabular  presentation,  242-272 ; 
limitations  upon,  247-250. 


Tabulation,  alternative  vs.  complete, 
249-250 ;  compactness  as  an  essen- 
tial in,  252-253 ;  comparability  as 
an  essential  in,  251-252 ;  compre- 
hensiveness as  an  essential  in, 
250-252  ;  essentials  of  good,  250- 
253 ;  Umitation  upon  complete, 
248-249;  meaning  of ,  269 ;  "mis- 
cellaneous" columns  and,  252- 
253  ;  nature  of,  242-244  ;  relation 
of,  to  classification,  269  ;  standards 
in,  260;  statistical  standards  in, 
269-270;  time  unit  groups  in, 
202-203;  use  of  groups  in,  319- 
323  ;    wage  groups  in,  202. 

Time  series  compared,  29-30. 

Titles,  position  of,  in  diagrams,  274. 

Totals,  position  of,  in  tables,  266- 
267;  in  derived  tables,  266-267; 
use  of,  in  general  and  in  derivative 
tables,  254. 

Tuberculosis,  statistics  of  treatment 
for,  15. 

Unemplojinent,  relation  of  earnings 
to,  192  ;  relation  of  wage  rates  to, 
57  ;  sources  and  types  of  statistics 
on,  47-57;  state  departments  of 
labor  as  sources  of  statistics  on, 
47-49 ;  unions  as  sources  of  in- 
formation on,  49-55. 

Unit,  accident  frequency  rate  as  a, 
167-169 ;  accident  severity  rate 
as  a,  169-184 ;  an  accident  as  a, 
165 ;  days  lost  as  a  statistical 
unit,  179-181 ;  full-time  worker  as 
a  statistical,  167-168 ;  how  to 
measure  man-hours  as  a  statistical, 
168-169  ;  man-hour  as  a  statistical, 
168;  mile  of  track  as  a,  160;  300- 
day  worker  as  a  statistical,  168 ; 
the  ton-mile  as  a,  187 ;  the  train- 
mile  as  a,  186-189 ;  use  of  train- 
mile  as  a,  188. 

Unit  facts  vs.  group  facts,  21. 

Units,  accuracy  of,  158-159 ;  ac- 
curacy in  defining,  144-145 ;  char- 
acteristics of,  necessarj'  to  statis- 
tical measurement,  150-159 ;   com- 


INDEX 


427 


parability  a  characteristic  of,  156 ; 
compound,  25  ;  definitions  of,  24  ; 
frequency  rates  and  severity  rates 
contrasted  as,  180;  homogeneity 
of,  151-154 ;  industrial,  in  wage 
studies,  196-197 ;  log-scales  as, 
and  accuracy,  91-95 ;  place  of, 
in  statistics,  161 ;  relativity  a 
characteristic  of,  157-158 ;  simple, 
25 ;  stability  a  characteristic  of, 
155-156 ;  statistical,  and  homo- 
geneity, 24-25 ;  statistical,  in 
business  illustrated,  24-25;  uni- 
formity of,  in  measuring  factory 
output,  125-126 ;  universality  a 
characteristic  of,  154-155 ;  uni- 
versaUty  of,  through  inclusive 
data,  154-155 ;  universality  of, 
through  samples,  155. 

Wage  data,  pay  rolls  as  source  of, 
197-198,  199-201 ;  representative 
character  of,  198-199. 

Wage  rates,  rules  for  computation 
of,  205-208. 

Wages,  as  a  statistical  unit,  24 ; 
definition  of,  192 ;  difficulties  of 
international  comparison  of,  398- 


400 ;  grouping  of,  in  tabulation, 
202 ;  interpretation  of,  from  pay 
rolls,  201-202;  meanings  of,  398- 
399  ;  measurement  of,  as  earnings, 
192-193 ;  measurement  of,  as 
rates,  192-193 ;  method  of  study, 
191-209 ;  piece  basis  for  paying, 
204-205 ;  relation  of  unemploy- 
ment to,  57 ;  statistics  necessary 
on,  15 ;  study  of,  by  sampling, 
193-195  ;  198-199  ;  time  basis  for 
paying,  204-205 ;  earnings  and, 
398. 

Weighted  average,  computation  of  a, 
illustrated,  330-331 ;  use  of  a,  in 
crop  reporting,  329-331. 

Weighted  index  number,  350-354. 

Weighting,  bases  for,  in  stock  index 
numbers,  361-364 ;  haphazard, 
361-362. 

Weights,  significance  of  relative,  179. 

Zero,    the    horizontal,    in    frequency 

diagrams,  385-394. 
Zero  line,  absence  of  a,  in  logarithmic 

diagrams,    296,    297 ;    necessity   of 

a,  in  natural  scale  diagrams,  296- 

297. 


Printed  in  the  United  States  of  Amerioa. 


UNIVERSITY  OF  CALIFORNIA  AT  LOS  ANGELES 

THE  UNIVERSITY  LIBRARY 
This  book  is  DUE  on  the  last  date  stamped  below 


MAY  I  3  1963 


1 9 1959 


Form  L-9-1 '1)11-7. ';!"> 


AA    000  560  109    i 


HA 

29 

S44r 


